-
Multimodal Atmospheric Super-Resolution With Deep Generative Models
Authors:
Dibyajyoti Chakraborty,
Haiwen Guan,
Jason Stock,
Troy Arcomano,
Guido Cervone,
Romit Maulik
Abstract:
Score-based diffusion modeling is a generative machine learning algorithm that can be used to sample from complex distributions. They achieve this by learning a score function, i.e., the gradient of the log-probability density of the data, and reversing a noising process using the same. Once trained, score-based diffusion models not only generate new samples but also enable zero-shot conditioning…
▽ More
Score-based diffusion modeling is a generative machine learning algorithm that can be used to sample from complex distributions. They achieve this by learning a score function, i.e., the gradient of the log-probability density of the data, and reversing a noising process using the same. Once trained, score-based diffusion models not only generate new samples but also enable zero-shot conditioning of the generated samples on observed data. This promises a novel paradigm for data and model fusion, wherein the implicitly learned distributions of pretrained score-based diffusion models can be updated given the availability of online data in a Bayesian formulation. In this article, we apply such a concept to the super-resolution of a high-dimensional dynamical system, given the real-time availability of low-resolution and experimentally observed sparse sensor measurements from multimodal data. Additional analysis on how score-based sampling can be used for uncertainty estimates is also provided. Our experiments are performed for a super-resolution task that generates the ERA5 atmospheric dataset given sparse observations from a coarse-grained representation of the same and/or from unstructured experimental observations of the IGRA radiosonde dataset. We demonstrate accurate recovery of the high dimensional state given multiple sources of low-fidelity measurements. We also discover that the generative model can balance the influence of multiple dataset modalities during spatiotemporal reconstructions.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
FIGNN: Feature-Specific Interpretability for Graph Neural Network Surrogate Models
Authors:
Riddhiman Raut,
Romit Maulik,
Shivam Barwey
Abstract:
This work presents a novel graph neural network (GNN) architecture, the Feature-specific Interpretable Graph Neural Network (FIGNN), designed to enhance the interpretability of deep learning surrogate models defined on unstructured grids in scientific applications. Traditional GNNs often obscure the distinct spatial influences of different features in multivariate prediction tasks. FIGNN addresses…
▽ More
This work presents a novel graph neural network (GNN) architecture, the Feature-specific Interpretable Graph Neural Network (FIGNN), designed to enhance the interpretability of deep learning surrogate models defined on unstructured grids in scientific applications. Traditional GNNs often obscure the distinct spatial influences of different features in multivariate prediction tasks. FIGNN addresses this limitation by introducing a feature-specific pooling strategy, which enables independent attribution of spatial importance for each predicted variable. Additionally, a mask-based regularization term is incorporated into the training objective to explicitly encourage alignment between interpretability and predictive error, promoting localized attribution of model performance. The method is evaluated for surrogate modeling of two physically distinct systems: the SPEEDY atmospheric circulation model and the backward-facing step (BFS) fluid dynamics benchmark. Results demonstrate that FIGNN achieves competitive predictive performance while revealing physically meaningful spatial patterns unique to each feature. Analysis of rollout stability, feature-wise error budgets, and spatial mask overlays confirm the utility of FIGNN as a general-purpose framework for interpretable surrogate modeling in complex physical domains.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning
Authors:
Xuyang Li,
Romit Maulik
Abstract:
Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems--especially those requiring precise and reliable performance--often demand interpretability in the sense of a-priori assessments of agent behavior to identify safe or failure-prone interactions with environments. To address this limitation, we pr…
▽ More
Modern deep reinforcement learning (DRL) methods have made significant advances in handling continuous action spaces. However, real-world control systems--especially those requiring precise and reliable performance--often demand interpretability in the sense of a-priori assessments of agent behavior to identify safe or failure-prone interactions with environments. To address this limitation, we propose SALSA-RL (Stability Analysis in the Latent Space of Actions), a novel RL framework that models control actions as dynamic, time-dependent variables evolving within a latent space. By employing a pre-trained encoder-decoder and a state-dependent linear system, our approach enables interpretability through local stability analysis, where instantaneous growth in action-norms can be predicted before their execution. We demonstrate that SALSA-RL can be deployed in a non-invasive manner for assessing the local stability of actions from pretrained RL agents without compromising on performance across diverse benchmark environments. By enabling a more interpretable analysis of action generation, SALSA-RL provides a powerful tool for advancing the design, analysis, and theoretical understanding of RL systems.
△ Less
Submitted 25 May, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
CondensNet: Enabling stable long-term climate simulations via hybrid deep learning models with adaptive physical constraints
Authors:
Xin Wang,
Juntao Yang,
Jeff Adie,
Simon See,
Kalli Furtado,
Chen Chen,
Troy Arcomano,
Romit Maulik,
Gianmarco Mengaldo
Abstract:
Accurate and efficient climate simulations are crucial for understanding Earth's evolving climate. However, current general circulation models (GCMs) face challenges in capturing unresolved physical processes, such as cloud and convection. A common solution is to adopt cloud resolving models, that provide more accurate results than the standard subgrid parametrisation schemes typically used in GCM…
▽ More
Accurate and efficient climate simulations are crucial for understanding Earth's evolving climate. However, current general circulation models (GCMs) face challenges in capturing unresolved physical processes, such as cloud and convection. A common solution is to adopt cloud resolving models, that provide more accurate results than the standard subgrid parametrisation schemes typically used in GCMs. However, cloud resolving models, also referred to as super paramtetrizations, remain computationally prohibitive. Hybrid modeling, which integrates deep learning with equation-based GCMs, offers a promising alternative but often struggles with long-term stability and accuracy issues. In this work, we find that water vapor oversaturation during condensation is a key factor compromising the stability of hybrid models. To address this, we introduce CondensNet, a novel neural network architecture that embeds a self-adaptive physical constraint to correct unphysical condensation processes. CondensNet effectively mitigates water vapor oversaturation, enhancing simulation stability while maintaining accuracy and improving computational efficiency compared to super parameterization schemes.
We integrate CondensNet into a GCM to form PCNN-GCM (Physics-Constrained Neural Network GCM), a hybrid deep learning framework designed for long-term stable climate simulations in real-world conditions, including ocean and land. PCNN-GCM represents a significant milestone in hybrid climate modeling, as it shows a novel way to incorporate physical constraints adaptively, paving the way for accurate, lightweight, and stable long-term climate simulations.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Binned Spectral Power Loss for Improved Prediction of Chaotic Systems
Authors:
Dibyajyoti Chakraborty,
Arvind T. Mohan,
Romit Maulik
Abstract:
Forecasting multiscale chaotic dynamical systems with deep learning remains a formidable challenge due to the spectral bias of neural networks, which hinders the accurate representation of fine-scale structures in long-term predictions. This issue is exacerbated when models are deployed autoregressively, leading to compounding errors and instability. In this work, we introduce a novel approach to…
▽ More
Forecasting multiscale chaotic dynamical systems with deep learning remains a formidable challenge due to the spectral bias of neural networks, which hinders the accurate representation of fine-scale structures in long-term predictions. This issue is exacerbated when models are deployed autoregressively, leading to compounding errors and instability. In this work, we introduce a novel approach to mitigate the spectral bias which we call the Binned Spectral Power (BSP) Loss. The BSP loss is a frequency-domain loss function that adaptively weighs errors in predicting both larger and smaller scales of the dataset. Unlike traditional losses that focus on pointwise misfits, our BSP loss explicitly penalizes deviations in the energy distribution across different scales, promoting stable and physically consistent predictions. We demonstrate that the BSP loss mitigates the well-known problem of spectral bias in deep learning. We further validate our approach for the data-driven high-dimensional time-series forecasting of a range of benchmark chaotic systems which are typically intractable due to spectral bias. Our results demonstrate that the BSP loss significantly improves the stability and spectral accuracy of neural forecasting models without requiring architectural modifications. By directly targeting spectral consistency, our approach paves the way for more robust deep learning models for long-term forecasting of chaotic dynamical systems.
△ Less
Submitted 16 May, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
Semi-Implicit Neural Ordinary Differential Equations
Authors:
Hong Zhang,
Ying Liu,
Romit Maulik
Abstract:
Classical neural ODEs trained with explicit methods are intrinsically limited by stability, crippling their efficiency and robustness for stiff learning problems that are common in graph learning and scientific machine learning. We present a semi-implicit neural ODE approach that exploits the partitionable structure of the underlying dynamics. Our technique leads to an implicit neural network with…
▽ More
Classical neural ODEs trained with explicit methods are intrinsically limited by stability, crippling their efficiency and robustness for stiff learning problems that are common in graph learning and scientific machine learning. We present a semi-implicit neural ODE approach that exploits the partitionable structure of the underlying dynamics. Our technique leads to an implicit neural network with significant computational advantages over existing approaches because of enhanced stability and efficient linear solves during time integration. We show that our approach outperforms existing approaches on a variety of applications including graph classification and learning complex dynamical systems. We also demonstrate that our approach can train challenging neural ODEs where both explicit methods and fully implicit methods are intractable.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Improved deep learning of chaotic dynamical systems with multistep penalty losses
Authors:
Dibyajyoti Chakraborty,
Seung Whan Chung,
Ashesh Chattopadhyay,
Romit Maulik
Abstract:
Predicting the long-term behavior of chaotic systems remains a formidable challenge due to their extreme sensitivity to initial conditions and the inherent limitations of traditional data-driven modeling approaches. This paper introduces a novel framework that addresses these challenges by leveraging the recently proposed multi-step penalty (MP) optimization technique. Our approach extends the app…
▽ More
Predicting the long-term behavior of chaotic systems remains a formidable challenge due to their extreme sensitivity to initial conditions and the inherent limitations of traditional data-driven modeling approaches. This paper introduces a novel framework that addresses these challenges by leveraging the recently proposed multi-step penalty (MP) optimization technique. Our approach extends the applicability of MP optimization to a wide range of deep learning architectures, including Fourier Neural Operators and UNETs. By introducing penalized local discontinuities in the forecast trajectory, we effectively handle the non-convexity of loss landscapes commonly encountered in training neural networks for chaotic systems. We demonstrate the effectiveness of our method through its application to two challenging use-cases: the prediction of flow velocity evolution in two-dimensional turbulence and ocean dynamics using reanalysis data. Our results highlight the potential of this approach for accurate and stable long-term prediction of chaotic dynamics, paving the way for new advancements in data-driven modeling of complex natural phenomena.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Scalable and Consistent Graph Neural Networks for Distributed Mesh-based Data-driven Modeling
Authors:
Shivam Barwey,
Riccardo Balin,
Bethany Lusch,
Saumil Patel,
Ramesh Balakrishnan,
Pinaki Pal,
Romit Maulik,
Venkatram Vishwanath
Abstract:
This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is…
▽ More
This work develops a distributed graph neural network (GNN) methodology for mesh-based modeling applications using a consistent neural message passing layer. As the name implies, the focus is on enabling scalable operations that satisfy physical consistency via halo nodes at sub-graph boundaries. Here, consistency refers to the fact that a GNN trained and evaluated on one rank (one large graph) is arithmetically equivalent to evaluations on multiple ranks (a partitioned graph). This concept is demonstrated by interfacing GNNs with NekRS, a GPU-capable exascale CFD solver developed at Argonne National Laboratory. It is shown how the NekRS mesh partitioning can be linked to the distributed GNN training and inference routines, resulting in a scalable mesh-based data-driven modeling workflow. We study the impact of consistency on the scalability of mesh-based GNNs, demonstrating efficient scaling in consistent GNNs for up to O(1B) graph nodes on the Frontier exascale supercomputer.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
A competitive baseline for deep learning enhanced data assimilation using conditional Gaussian ensemble Kalman filtering
Authors:
Zachariah Malik,
Romit Maulik
Abstract:
Ensemble Kalman Filtering (EnKF) is a popular technique for data assimilation, with far ranging applications. However, the vanilla EnKF framework is not well-defined when perturbations are nonlinear. We study two non-linear extensions of the vanilla EnKF - dubbed the conditional-Gaussian EnKF (CG-EnKF) and the normal score EnKF (NS-EnKF) - which sidestep assumptions of linearity by constructing th…
▽ More
Ensemble Kalman Filtering (EnKF) is a popular technique for data assimilation, with far ranging applications. However, the vanilla EnKF framework is not well-defined when perturbations are nonlinear. We study two non-linear extensions of the vanilla EnKF - dubbed the conditional-Gaussian EnKF (CG-EnKF) and the normal score EnKF (NS-EnKF) - which sidestep assumptions of linearity by constructing the Kalman gain matrix with the `conditional Gaussian' update formula in place of the traditional one. We then compare these models against a state-of-the-art deep learning based particle filter called the score filter (SF). This model uses an expensive score diffusion model for estimating densities and also requires a strong assumption on the perturbation operator for validity. In our comparison, we find that CG-EnKF and NS-EnKF dramatically outperform SF for a canonical problem in high-dimensional multiscale data assimilation given by the Lorenz-96 system. Our analysis also demonstrates that the CG-EnKF and NS-EnKF can handle highly non-Gaussian additive noise perturbations, with the latter typically outperforming the former.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
Measure-Theoretic Time-Delay Embedding
Authors:
Jonah Botvinick-Greenhouse,
Maria Oprea,
Romit Maulik,
Yunan Yang
Abstract:
The celebrated Takens' embedding theorem provides a theoretical foundation for reconstructing the full state of a dynamical system from partial observations. However, the classical theorem assumes that the underlying system is deterministic and that observations are noise-free, limiting its applicability in real-world scenarios. Motivated by these limitations, we rigorously establish a measure-the…
▽ More
The celebrated Takens' embedding theorem provides a theoretical foundation for reconstructing the full state of a dynamical system from partial observations. However, the classical theorem assumes that the underlying system is deterministic and that observations are noise-free, limiting its applicability in real-world scenarios. Motivated by these limitations, we rigorously establish a measure-theoretic generalization that adopts an Eulerian description of the dynamics and recasts the embedding as a pushforward map between probability spaces. Our mathematical results leverage recent advances in optimal transportation theory. Building on our novel measure-theoretic time-delay embedding theory, we have developed a new computational framework that forecasts the full state of a dynamical system from time-lagged partial observations, engineered with better robustness to handle sparse and noisy data. We showcase the efficacy and versatility of our approach through several numerical examples, ranging from the classic Lorenz-63 system to large-scale, real-world applications such as NOAA sea surface temperature forecasting and ERA5 wind field reconstruction.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Mesh-based Super-Resolution of Fluid Flows with Multiscale Graph Neural Networks
Authors:
Shivam Barwey,
Pinaki Pal,
Saumil Patel,
Riccardo Balin,
Bethany Lusch,
Venkatram Vishwanath,
Romit Maulik,
Ramesh Balakrishnan
Abstract:
A graph neural network (GNN) approach is introduced in this work which enables mesh-based three-dimensional super-resolution of fluid flows. In this framework, the GNN is designed to operate not on the full mesh-based field at once, but on localized meshes of elements (or cells) directly. To facilitate mesh-based GNN representations in a manner similar to spectral (or finite) element discretizatio…
▽ More
A graph neural network (GNN) approach is introduced in this work which enables mesh-based three-dimensional super-resolution of fluid flows. In this framework, the GNN is designed to operate not on the full mesh-based field at once, but on localized meshes of elements (or cells) directly. To facilitate mesh-based GNN representations in a manner similar to spectral (or finite) element discretizations, a baseline GNN layer (termed a message passing layer, which updates local node properties) is modified to account for synchronization of coincident graph nodes, rendering compatibility with commonly used element-based mesh connectivities. The architecture is multiscale in nature, and is comprised of a combination of coarse-scale and fine-scale message passing layer sequences (termed processors) separated by a graph unpooling layer. The coarse-scale processor embeds a query element (alongside a set number of neighboring coarse elements) into a single latent graph representation using coarse-scale synchronized message passing over the element neighborhood, and the fine-scale processor leverages additional message passing operations on this latent graph to correct for interpolation errors. Demonstration studies are performed using hexahedral mesh-based data from Taylor-Green Vortex and backward-facing step flow simulations at Reynolds numbers of 1600 and 3200. Through analysis of both global and local errors, the results ultimately show how the GNN is able to produce accurate super-resolved fields compared to targets in both coarse-scale and multiscale model configurations. Reconstruction errors for fixed architectures were found to increase in proportion to the Reynolds number. Geometry extrapolation studies on a separate cavity flow configuration show promising cross-mesh capabilities of the super-resolution strategy.
△ Less
Submitted 1 May, 2025; v1 submitted 12 September, 2024;
originally announced September 2024.
-
Higher order quantum reservoir computing for non-intrusive reduced-order models
Authors:
Vinamr Jain,
Romit Maulik
Abstract:
Forecasting dynamical systems is of importance to numerous real-world applications. When possible, dynamical systems forecasts are constructed based on first-principles-based models such as through the use of differential equations. When these equations are unknown, non-intrusive techniques must be utilized to build predictive models from data alone. Machine learning (ML) methods have recently bee…
▽ More
Forecasting dynamical systems is of importance to numerous real-world applications. When possible, dynamical systems forecasts are constructed based on first-principles-based models such as through the use of differential equations. When these equations are unknown, non-intrusive techniques must be utilized to build predictive models from data alone. Machine learning (ML) methods have recently been used for such tasks. Moreover, ML methods provide the added advantage of significant reductions in time-to-solution for predictions in contrast with first-principle based models. However, many state-of-the-art ML-based methods for forecasting rely on neural networks, which may be expensive to train and necessitate requirements for large amounts of memory. In this work, we propose a quantum mechanics inspired ML modeling strategy for learning nonlinear dynamical systems that provides data-driven forecasts for complex dynamical systems with reduced training time and memory costs. This approach, denoted the quantum reservoir computing technique (QRC), is a hybrid quantum-classical framework employing an ensemble of interconnected small quantum systems via classical linear feedback connections. By mapping the dynamical state to a suitable quantum representation amenable to unitary operations, QRC is able to predict complex nonlinear dynamical systems in a stable and accurate manner. We demonstrate the efficacy of this framework through benchmark forecasts of the NOAA Optimal Interpolation Sea Surface Temperature dataset and compare the performance of QRC to other ML methods.
△ Less
Submitted 31 July, 2024;
originally announced July 2024.
-
Divide And Conquer: Learning Chaotic Dynamical Systems With Multistep Penalty Neural Ordinary Differential Equations
Authors:
Dibyajyoti Chakraborty,
Seung Whan Chung,
Troy Arcomano,
Romit Maulik
Abstract:
Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are ineffect…
▽ More
Forecasting high-dimensional dynamical systems is a fundamental challenge in various fields, such as geosciences and engineering. Neural Ordinary Differential Equations (NODEs), which combine the power of neural networks and numerical solvers, have emerged as a promising algorithm for forecasting complex nonlinear dynamical systems. However, classical techniques used for NODE training are ineffective for learning chaotic dynamical systems. In this work, we propose a novel NODE-training approach that allows for robust learning of chaotic dynamical systems. Our method addresses the challenges of non-convexity and exploding gradients associated with underlying chaotic dynamics. Training data trajectories from such systems are split into multiple, non-overlapping time windows. In addition to the deviation from the training data, the optimization loss term further penalizes the discontinuities of the predicted trajectory between the time windows. The window size is selected based on the fastest Lyapunov time scale of the system. Multi-step penalty(MP) method is first demonstrated on Lorenz equation, to illustrate how it improves the loss landscape and thereby accelerates the optimization convergence. MP method can optimize chaotic systems in a manner similar to least-squares shadowing with significantly lower computational costs. Our proposed algorithm, denoted the Multistep Penalty NODE, is applied to chaotic systems such as the Kuramoto-Sivashinsky equation, the two-dimensional Kolmogorov flow, and ERA5 reanalysis data for the atmosphere. It is observed that MP-NODE provide viable performance for such chaotic systems, not only for short-term trajectory predictions but also for invariant statistics that are hallmarks of the chaotic nature of these dynamics.
△ Less
Submitted 15 October, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
A note on the error analysis of data-driven closure models for large eddy simulations of turbulence
Authors:
Dibyajyoti Chakraborty,
Shivam Barwey,
Hong Zhang,
Romit Maulik
Abstract:
In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We…
▽ More
In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation.
△ Less
Submitted 29 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
LUCIE: A Lightweight Uncoupled ClImate Emulator with long-term stability and physical consistency for O(1000)-member ensembles
Authors:
Haiwen Guan,
Troy Arcomano,
Ashesh Chattopadhyay,
Romit Maulik
Abstract:
We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as $2$ years of $6$-hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for $100$ years of autoregressive simulation with $100$ ensemble members. Long-term mean climatology from LUCIE's simulation of temperatu…
▽ More
We present a lightweight, easy-to-train, low-resolution, fully data-driven climate emulator, LUCIE, that can be trained on as low as $2$ years of $6$-hourly ERA5 data. Unlike most state-of-the-art AI weather models, LUCIE remains stable and physically consistent for $100$ years of autoregressive simulation with $100$ ensemble members. Long-term mean climatology from LUCIE's simulation of temperature, wind, precipitation, and humidity matches that of ERA5 data, along with the variability. We further demonstrate how well extreme weather events and their return periods can be estimated from a large ensemble of long-term simulations. We further discuss an improved training strategy with a hard-constrained first-order integrator to suppress autoregressive error growth, a novel spectral regularization strategy to better capture fine-scale dynamics, and finally an optimization algorithm that enables data-limited (as low as $2$ years of $6$-hourly data) training of the emulator without losing stability and physical consistency. Finally, we provide a scaling experiment to compare the long-term bias of LUCIE with respect to the number of training samples. Importantly, LUCIE is an easy to use model that can be trained in just $2.4$h on a single A-100 GPU, allowing for multiple experiments that can explore important scientific questions that could be answered with large ensembles of long-term simulations, e.g., the impact of different variables on the simulation, dynamic response to external forcing, and estimation of extreme weather events, amongst others.
△ Less
Submitted 10 April, 2025; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Leveraging Interpolation Models and Error Bounds for Verifiable Scientific Machine Learning
Authors:
Tyler Chang,
Andrew Gillette,
Romit Maulik
Abstract:
Effective verification and validation techniques for modern scientific machine learning workflows are challenging to devise. Statistical methods are abundant and easily deployed, but often rely on speculative assumptions about the data and methods involved. Error bounds for classical interpolation techniques can provide mathematically rigorous estimates of accuracy, but often are difficult or impr…
▽ More
Effective verification and validation techniques for modern scientific machine learning workflows are challenging to devise. Statistical methods are abundant and easily deployed, but often rely on speculative assumptions about the data and methods involved. Error bounds for classical interpolation techniques can provide mathematically rigorous estimates of accuracy, but often are difficult or impractical to determine computationally. In this work, we present a best-of-both-worlds approach to verifiable scientific machine learning by demonstrating that (1) multiple standard interpolation techniques have informative error bounds that can be computed or estimated efficiently; (2) comparative performance among distinct interpolants can aid in validation goals; (3) deploying interpolation methods on latent spaces generated by deep learning techniques enables some interpretability for black-box models. We present a detailed case study of our approach for predicting lift-drag ratios from airfoil images. Code developed for this work is available in a public Github repository.
△ Less
Submitted 7 February, 2025; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective
Authors:
Sunwoong Yang,
Hojin Kim,
Yoonpyo Hong,
Kwanjung Yee,
Romit Maulik,
Namwoo Kang
Abstract:
This study explores the potential of physics-informed neural networks (PINNs) for the realization of digital twins (DT) from various perspectives. First, various adaptive sampling approaches for collocation points are investigated to verify their effectiveness in the mesh-free framework of PINNs, which allows automated construction of virtual representation without manual mesh generation. Then, th…
▽ More
This study explores the potential of physics-informed neural networks (PINNs) for the realization of digital twins (DT) from various perspectives. First, various adaptive sampling approaches for collocation points are investigated to verify their effectiveness in the mesh-free framework of PINNs, which allows automated construction of virtual representation without manual mesh generation. Then, the overall performance of the data-driven PINNs (DD-PINNs) framework is examined, which can utilize the acquired datasets in DT scenarios. Its scalability to more general physics is validated within parametric Navier-Stokes equations, where PINNs do not need to be retrained as the Reynolds number varies. In addition, since datasets can be often collected from different fidelity/sparsity in practice, multi-fidelity DD-PINNs are also proposed and evaluated. They show remarkable prediction performance even in the extrapolation tasks, with $42\sim62\%$ improvement over the single-fidelity approach. Finally, the uncertainty quantification performance of multi-fidelity DD-PINNs is investigated by the ensemble method to verify their potential in DT, where an accurate measure of predictive uncertainty is critical. The DD-PINN frameworks explored in this study are found to be more suitable for DT scenarios than traditional PINNs from the above perspectives, bringing engineers one step closer to seamless DT realization.
△ Less
Submitted 19 May, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Scaling transformer neural networks for skillful and reliable medium-range weather forecasting
Authors:
Tung Nguyen,
Rohan Shah,
Hritik Bansal,
Troy Arcomano,
Romit Maulik,
Veerabhadra Kotamarthi,
Ian Foster,
Sandeep Madireddy,
Aditya Grover
Abstract:
Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it…
▽ More
Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints are available at https://github.com/tung-nd/stormer.
△ Less
Submitted 22 October, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Interpretable A-posteriori Error Indication for Graph Neural Network Surrogate Models
Authors:
Shivam Barwey,
Hojin Kim,
Romit Maulik
Abstract:
Data-driven surrogate modeling has surged in capability in recent years with the emergence of graph neural networks (GNNs), which can operate directly on mesh-based representations of data. The goal of this work is to introduce an interpretability enhancement procedure for GNNs, with application to unstructured mesh-based fluid dynamics modeling. Given a black-box baseline GNN model, the end resul…
▽ More
Data-driven surrogate modeling has surged in capability in recent years with the emergence of graph neural networks (GNNs), which can operate directly on mesh-based representations of data. The goal of this work is to introduce an interpretability enhancement procedure for GNNs, with application to unstructured mesh-based fluid dynamics modeling. Given a black-box baseline GNN model, the end result is an interpretable GNN model that isolates regions in physical space, corresponding to sub-graphs, that are intrinsically linked to the forecasting task while retaining the predictive capability of the baseline. These structures identified by the interpretable GNNs are adaptively produced in the forward pass and serve as explainable links between the baseline model architecture, the optimization goal, and known problem-specific physics. Additionally, through a regularization procedure, the interpretable GNNs can also be used to identify, during inference, graph nodes that correspond to a majority of the anticipated forecasting error, adding a novel interpretable error-tagging capability to baseline models. Demonstrations are performed using unstructured flow field data sourced from flow over a backward-facing step at high Reynolds numbers, with geometry extrapolations demonstrated for ramp and wall-mounted cube configurations.
△ Less
Submitted 24 October, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Authors:
Shuaiwen Leon Song,
Bonnie Kruft,
Minjia Zhang,
Conglong Li,
Shiyang Chen,
Chengming Zhang,
Masahiro Tanaka,
Xiaoxia Wu,
Jeff Rasley,
Ammar Ahmad Awan,
Connor Holmes,
Martin Cai,
Adam Ghanem,
Zhongzhu Zhou,
Yuxiong He,
Pete Luferenko,
Divya Kumar,
Jonathan Weyn,
Ruixiong Zhang,
Sylwester Klocek,
Volodymyr Vragov,
Mohammed AlQuraishi,
Gustaf Ahdritz,
Christina Floristean,
Cristina Negri
, et al. (67 additional authors not shown)
Abstract:
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique…
▽ More
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
△ Less
Submitted 11 October, 2023; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package
Authors:
Marcin Rogowski,
Brandon C. Y. Yeung,
Oliver T. Schmidt,
Romit Maulik,
Lisandro Dalcin,
Matteo Parsani,
Gianmarco Mengaldo
Abstract:
We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the…
▽ More
We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the PySPOD (https://github.com/MathEXLab/PySPOD) library and makes use of the standard message passing interface (MPI) library, implemented in Python via mpi4py (https://mpi4py.readthedocs.io/en/stable/). An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, helping to uncover new unexplored spatio-temporal patterns.
△ Less
Submitted 31 July, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Robust experimental data assimilation for the Spalart-Allmaras turbulence model
Authors:
Deepinder Jot Singh Aulakh,
Xiang Yang,
Romit Maulik
Abstract:
This study presents a methodology focusing on the use of computational model and experimental data fusion to improve the Spalart-Allmaras (SA) closure model for Reynolds-averaged Navier-Stokes solutions. In particular, our goal is to develop a technique that not only assimilates sparse experimental data to improve turbulence model performance, but also preserves generalization for unseen cases by…
▽ More
This study presents a methodology focusing on the use of computational model and experimental data fusion to improve the Spalart-Allmaras (SA) closure model for Reynolds-averaged Navier-Stokes solutions. In particular, our goal is to develop a technique that not only assimilates sparse experimental data to improve turbulence model performance, but also preserves generalization for unseen cases by recovering classical SA behavior. We achieve our goals using data assimilation, namely the Ensemble Kalman filtering approach (EnKF), to calibrate the coefficients of the SA model for separated flows. A holistic calibration strategy is implemented via the parameterization of the production, diffusion, and destruction terms. This calibration relies on the assimilation of experimental data collected in the form of velocity profiles, skin friction, and pressure coefficients. Despite using observational data from a single flow condition around a backward-facing step (BFS), the recalibrated SA model demonstrates generalization to other separated flows, including cases such as the 2D NASA wall mounted hump (2D-WMH) and modified BFS. Significant improvement is observed in the quantities of interest, i.e., skin friction coefficient ($C_f$) and pressure coefficient ($C_p$) for each flow tested. Finally, it is also demonstrated that the newly proposed model recovers SA proficiency for flows, such as a NACA-0012 airfoil and axisymmetric jet (ASJ), and that the individually calibrated terms in the SA model target specific flow-physics wherein the calibrated production term improves the re-circulation zone while destruction improves the recovery zone.
△ Less
Submitted 24 July, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Generalizable data-driven turbulence closure modeling on unstructured grids with differentiable physics
Authors:
Hojin Kim,
Varun Shankar,
Venkatasubramanian Viswanathan,
Romit Maulik
Abstract:
Differentiable physical simulators are proving to be valuable tools for developing data-driven models in computational fluid dynamics (CFD). These simulators enable end-to-end training of machine learning (ML) models embedded within CFD solvers. This paradigm enables novel algorithms which combine the generalization power and low cost of physics-based simulations with the flexibility and automatio…
▽ More
Differentiable physical simulators are proving to be valuable tools for developing data-driven models in computational fluid dynamics (CFD). These simulators enable end-to-end training of machine learning (ML) models embedded within CFD solvers. This paradigm enables novel algorithms which combine the generalization power and low cost of physics-based simulations with the flexibility and automation of deep learning methods. In this study, we introduce a framework for embedding deep learning models within a generic finite element solver to solve the Navier-Stokes equations, specifically applying this approach to learn a subgrid scale closure with a graph neural network (GNN). We validate our method for flow over a backwards-facing step and test its performance on novel geometries, demonstrating the ability to generalize to novel geometries without sacrificing stability. Additionally, we show that our GNN-based closure model may be learned in a data-limited scenario by interpreting closure modeling as a solver-constrained optimization. Our end-to-end learning paradigm demonstrates a viable pathway for physically consistent and generalizable data-driven closure modeling across complex geometries.
△ Less
Submitted 22 November, 2024; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Importance of equivariant and invariant symmetries for fluid flow modeling
Authors:
Varun Shankar,
Shivam Barwey,
Zico Kolter,
Romit Maulik,
Venkatasubramanian Viswanathan
Abstract:
Graph neural networks (GNNs) have shown promise in learning unstructured mesh-based simulations of physical systems, including fluid dynamics. In tandem, geometric deep learning principles have informed the development of equivariant architectures respecting underlying physical symmetries. However, the effect of rotational equivariance in modeling fluids remains unclear. We build a multi-scale equ…
▽ More
Graph neural networks (GNNs) have shown promise in learning unstructured mesh-based simulations of physical systems, including fluid dynamics. In tandem, geometric deep learning principles have informed the development of equivariant architectures respecting underlying physical symmetries. However, the effect of rotational equivariance in modeling fluids remains unclear. We build a multi-scale equivariant GNN to forecast fluid flow and study the effect of modeling invariant and non-invariant representations of the flow state. We evaluate the model performance of several equivariant and non-equivariant architectures on predicting the evolution of two fluid flows, flow around a cylinder and buoyancy-driven shear flow, to understand the effect of equivariance and invariance on data-driven modeling approaches. Our results show that modeling invariant quantities produces more accurate long-term predictions and that these invariant quantities may be learned from the velocity field using a data-driven encoder.
△ Less
Submitted 3 May, 2023;
originally announced July 2023.
-
Differentiable Turbulence: Closure as a partial differential equation constrained optimization
Authors:
Varun Shankar,
Dibyajyoti Chakraborty,
Venkatasubramanian Viswanathan,
Romit Maulik
Abstract:
Deep learning is increasingly becoming a promising pathway to improving the accuracy of sub-grid scale (SGS) turbulence closure models for large eddy simulations (LES). We leverage the concept of differentiable turbulence, whereby an end-to-end differentiable solver is used in combination with physics-inspired choices of deep learning architectures to learn highly effective and versatile SGS model…
▽ More
Deep learning is increasingly becoming a promising pathway to improving the accuracy of sub-grid scale (SGS) turbulence closure models for large eddy simulations (LES). We leverage the concept of differentiable turbulence, whereby an end-to-end differentiable solver is used in combination with physics-inspired choices of deep learning architectures to learn highly effective and versatile SGS models for two-dimensional turbulent flow. We perform an in-depth analysis of the inductive biases in the chosen architectures, finding that the inclusion of small-scale non-local features is most critical to effective SGS modeling, while large-scale features can improve pointwise accuracy of the \textit{a-posteriori} solution field. The velocity gradient tensor on the LES grid can be mapped directly to the SGS stress via decomposition of the inputs and outputs into isotropic, deviatoric, and anti-symmetric components. We see that the model can generalize to a variety of flow configurations, including higher and lower Reynolds numbers and different forcing conditions. We show that the differentiable physics paradigm is more successful than offline, \textit{a-priori} learning, and that hybrid solver-in-the-loop approaches to deep learning offer an ideal balance between computational efficiency, accuracy, and generalization. Our experiments provide physics-based recommendations for deep-learning based SGS modeling for generalizable closure modeling of turbulence.
△ Less
Submitted 27 March, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Generative modeling of time-dependent densities via optimal transport and projection pursuit
Authors:
Jonah Botvinick-Greenhouse,
Yunan Yang,
Romit Maulik
Abstract:
Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use…
▽ More
Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.
△ Less
Submitted 12 October, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Quantifying uncertainty for deep learning based forecasting and flow-reconstruction using neural architecture search ensembles
Authors:
Romit Maulik,
Romain Egele,
Krishnan Raghavan,
Prasanna Balaprakash
Abstract:
Classical problems in computational physics such as data-driven forecasting and signal reconstruction from sparse sensors have recently seen an explosion in deep neural network (DNN) based algorithmic approaches. However, most DNN models do not provide uncertainty estimates, which are crucial for establishing the trustworthiness of these techniques in downstream decision making tasks and scenarios…
▽ More
Classical problems in computational physics such as data-driven forecasting and signal reconstruction from sparse sensors have recently seen an explosion in deep neural network (DNN) based algorithmic approaches. However, most DNN models do not provide uncertainty estimates, which are crucial for establishing the trustworthiness of these techniques in downstream decision making tasks and scenarios. In recent years, ensemble-based methods have achieved significant success for the uncertainty quantification in DNNs on a number of benchmark problems. However, their performance on real-world applications remains under-explored. In this work, we present an automated approach to DNN discovery and demonstrate how this may also be utilized for ensemble-based uncertainty quantification. Specifically, we propose the use of a scalable neural and hyperparameter architecture search for discovering an ensemble of DNN models for complex dynamical systems. We highlight how the proposed method not only discovers high-performing neural network ensembles for our tasks, but also quantifies uncertainty seamlessly. This is achieved by using genetic algorithms and Bayesian optimization for sampling the search space of neural network architectures and hyperparameters. Subsequently, a model selection approach is used to identify candidate models for an ensemble set construction. Afterwards, a variance decomposition approach is used to estimate the uncertainty of the predictions from the ensemble. We demonstrate the feasibility of this framework for two tasks - forecasting from historical data and flow reconstruction from sparse sensors for the sea-surface temperature. We demonstrate superior performance from the ensemble in contrast with individual high-performing models and other benchmarks.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Multiscale Graph Neural Network Autoencoders for Interpretable Scientific Machine Learning
Authors:
Shivam Barwey,
Varun Shankar,
Venkatasubramanian Viswanathan,
Romit Maulik
Abstract:
The goal of this work is to address two limitations in autoencoder-based models: latent space interpretability and compatibility with unstructured meshes. This is accomplished here with the development of a novel graph neural network (GNN) autoencoding architecture with demonstrations on complex fluid flow applications. To address the first goal of interpretability, the GNN autoencoder achieves re…
▽ More
The goal of this work is to address two limitations in autoencoder-based models: latent space interpretability and compatibility with unstructured meshes. This is accomplished here with the development of a novel graph neural network (GNN) autoencoding architecture with demonstrations on complex fluid flow applications. To address the first goal of interpretability, the GNN autoencoder achieves reduction in the number nodes in the encoding stage through an adaptive graph reduction procedure. This reduction procedure essentially amounts to flowfield-conditioned node sampling and sensor identification, and produces interpretable latent graph representations tailored to the flowfield reconstruction task in the form of so-called masked fields. These masked fields allow the user to (a) visualize where in physical space a given latent graph is active, and (b) interpret the time-evolution of the latent graph connectivity in accordance with the time-evolution of unsteady flow features (e.g. recirculation zones, shear layers) in the domain. To address the goal of unstructured mesh compatibility, the autoencoding architecture utilizes a series of multi-scale message passing (MMP) layers, each of which models information exchange among node neighborhoods at various lengthscales. The MMP layer, which augments standard single-scale message passing with learnable coarsening operations, allows the decoder to more efficiently reconstruct the flowfield from the identified regions in the masked fields. Analysis of latent graphs produced by the autoencoder for various model settings are conducted using using unstructured snapshot data sourced from large-eddy simulations in a backward-facing step (BFS) flow configuration with an OpenFOAM-based flow solver at high Reynolds numbers.
△ Less
Submitted 16 February, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Modeling Wind Turbine Performance and Wake Interactions with Machine Learning
Authors:
C. Moss,
R. Maulik,
G. V. Iungo
Abstract:
Different machine learning (ML) models are trained on SCADA and meteorological data collected at an onshore wind farm and then assessed in terms of fidelity and accuracy for predictions of wind speed, turbulence intensity, and power capture at the turbine and wind farm levels for different wind and atmospheric conditions. ML methods for data quality control and pre-processing are applied to the da…
▽ More
Different machine learning (ML) models are trained on SCADA and meteorological data collected at an onshore wind farm and then assessed in terms of fidelity and accuracy for predictions of wind speed, turbulence intensity, and power capture at the turbine and wind farm levels for different wind and atmospheric conditions. ML methods for data quality control and pre-processing are applied to the data set under investigation and found to outperform standard statistical methods. A hybrid model, comprised of a linear interpolation model, Gaussian process, deep neural network (DNN), and support vector machine, paired with a DNN filter, is found to achieve high accuracy for modeling wind turbine power capture. Modifications of the incoming freestream wind speed and turbulence intensity, $TI$, due to the evolution of the wind field over the wind farm and effects associated with operating turbines are also captured using DNN models. Thus, turbine-level modeling is achieved using models for predicting power capture while farm-level modeling is achieved by combining models predicting wind speed and $TI$ at each turbine location from freestream conditions with models predicting power capture. Combining these models provides results consistent with expected power capture performance and holds promise for future endeavors in wind farm modeling and diagnostics. Though training ML models is computationally expensive, using the trained models to simulate the entire wind farm takes only a few seconds on a typical modern laptop computer, and the total computational cost is still lower than other available mid-fidelity simulation approaches.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Differentiable physics-enabled closure modeling for Burgers' turbulence
Authors:
Varun Shankar,
Vedant Puri,
Ramesh Balakrishnan,
Romit Maulik,
Venkatasubramanian Viswanathan
Abstract:
Data-driven turbulence modeling is experiencing a surge in interest following algorithmic and hardware developments in the data sciences. We discuss an approach using the differentiable physics paradigm that combines known physics with machine learning to develop closure models for Burgers' turbulence. We consider the 1D Burgers system as a prototypical test problem for modeling the unresolved ter…
▽ More
Data-driven turbulence modeling is experiencing a surge in interest following algorithmic and hardware developments in the data sciences. We discuss an approach using the differentiable physics paradigm that combines known physics with machine learning to develop closure models for Burgers' turbulence. We consider the 1D Burgers system as a prototypical test problem for modeling the unresolved terms in advection-dominated turbulence problems. We train a series of models that incorporate varying degrees of physical assumptions on an a posteriori loss function to test the efficacy of models across a range of system parameters, including viscosity, time, and grid resolution. We find that constraining models with inductive biases in the form of partial differential equations that contain known physics or existing closure approaches produces highly data-efficient, accurate, and generalizable models, outperforming state-of-the-art baselines. Addition of structure in the form of physics information also brings a level of interpretability to the models, potentially offering a stepping stone to the future of closure modeling.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
Stabilized Neural Ordinary Differential Equations for Long-Time Forecasting of Dynamical Systems
Authors:
Alec J. Linot,
Joshua W. Burby,
Qi Tang,
Prasanna Balaprakash,
Michael D. Graham,
Romit Maulik
Abstract:
In data-driven modeling of spatiotemporal phenomena careful consideration often needs to be made in capturing the dynamics of the high wavenumbers. This problem becomes especially challenging when the system of interest exhibits shocks or chaotic dynamics. We present a data-driven modeling method that accurately captures shocks and chaotic dynamics by proposing a novel architecture, stabilized neu…
▽ More
In data-driven modeling of spatiotemporal phenomena careful consideration often needs to be made in capturing the dynamics of the high wavenumbers. This problem becomes especially challenging when the system of interest exhibits shocks or chaotic dynamics. We present a data-driven modeling method that accurately captures shocks and chaotic dynamics by proposing a novel architecture, stabilized neural ordinary differential equation (ODE). In our proposed architecture, we learn the right-hand-side (RHS) of an ODE by adding the outputs of two NN together where one learns a linear term and the other a nonlinear term. Specifically, we implement this by training a sparse linear convolutional NN to learn the linear term and a dense fully-connected nonlinear NN to learn the nonlinear term. This is in contrast with the standard neural ODE which involves training only a single NN for learning the RHS. We apply this setup to the viscous Burgers equation, which exhibits shocked behavior, and show better short-time tracking and prediction of the energy spectrum at high wavenumbers than a standard neural ODE. We also find that the stabilized neural ODE models are much more robust to noisy initial conditions than the standard neural ODE approach. We also apply this method to chaotic trajectories of the Kuramoto-Sivashinsky equation. In this case, stabilized neural ODEs keep long-time trajectories on the attractor, and are highly robust to noisy initial conditions, while standard neural ODEs fail at achieving either of these results. We conclude by demonstrating how stabilizing neural ODEs provide a natural extension for use in reduced-order modeling by projecting the dynamics onto the eigenvectors of the learned linear term.
△ Less
Submitted 3 October, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Multi-fidelity reinforcement learning framework for shape optimization
Authors:
Sahil Bhola,
Suraj Pawar,
Prasanna Balaprakash,
Romit Maulik
Abstract:
Deep reinforcement learning (DRL) is a promising outer-loop intelligence paradigm which can deploy problem solving strategies for complex tasks. Consequently, DRL has been utilized for several scientific applications, specifically in cases where classical optimization or control methods are limited. One key limitation of conventional DRL methods is their episode-hungry nature which proves to be a…
▽ More
Deep reinforcement learning (DRL) is a promising outer-loop intelligence paradigm which can deploy problem solving strategies for complex tasks. Consequently, DRL has been utilized for several scientific applications, specifically in cases where classical optimization or control methods are limited. One key limitation of conventional DRL methods is their episode-hungry nature which proves to be a bottleneck for tasks which involve costly evaluations of a numerical forward model. In this article, we address this limitation of DRL by introducing a controlled transfer learning framework that leverages a multi-fidelity simulation setting. Our strategy is deployed for an airfoil shape optimization problem at high Reynolds numbers, where our framework can learn an optimal policy for generating efficient airfoil shapes by gathering knowledge from multi-fidelity environments and reduces computational costs by over 30\%. Furthermore, our formulation promotes policy exploration and generalization to new environments, thereby preventing over-fitting to data from solely one fidelity. Our results demonstrate this framework's applicability to other scientific DRL scenarios where multi-fidelity environments can be used for policy learning.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification
Authors:
Romain Egele,
Romit Maulik,
Krishnan Raghavan,
Bethany Lusch,
Isabelle Guyon,
Prasanna Balaprakash
Abstract:
Deep neural networks are powerful predictors for a variety of tasks. However, they do not capture uncertainty directly. Using neural network ensembles to quantify uncertainty is competitive with approaches based on Bayesian neural networks while benefiting from better computational scalability. However, building ensembles of neural networks is a challenging task because, in addition to choosing th…
▽ More
Deep neural networks are powerful predictors for a variety of tasks. However, they do not capture uncertainty directly. Using neural network ensembles to quantify uncertainty is competitive with approaches based on Bayesian neural networks while benefiting from better computational scalability. However, building ensembles of neural networks is a challenging task because, in addition to choosing the right neural architecture or hyperparameters for each member of the ensemble, there is an added cost of training each model. We propose AutoDEUQ, an automated approach for generating an ensemble of deep neural networks. Our approach leverages joint neural architecture and hyperparameter search to generate ensembles. We use the law of total variance to decompose the predictive variance of deep ensembles into aleatoric (data) and epistemic (model) uncertainties. We show that AutoDEUQ outperforms probabilistic backpropagation, Monte Carlo dropout, deep ensemble, distribution-free ensembles, and hyper ensemble methods on a number of regression benchmarks.
△ Less
Submitted 4 July, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Assessments of epistemic uncertainty using Gaussian stochastic weight averaging for fluid-flow regression
Authors:
Masaki Morimoto,
Kai Fukami,
Romit Maulik,
Ricardo Vinuesa,
Koji Fukagata
Abstract:
We use Gaussian stochastic weight averaging (SWAG) to assess the model-form uncertainty associated with neural-network-based function approximation relevant to fluid flows. SWAG approximates a posterior Gaussian distribution of each weight, given training data, and a constant learning rate. Having access to this distribution, it is able to create multiple models with various combinations of sample…
▽ More
We use Gaussian stochastic weight averaging (SWAG) to assess the model-form uncertainty associated with neural-network-based function approximation relevant to fluid flows. SWAG approximates a posterior Gaussian distribution of each weight, given training data, and a constant learning rate. Having access to this distribution, it is able to create multiple models with various combinations of sampled weights, which can be used to obtain ensemble predictions. The average of such an ensemble can be regarded as the `mean estimation', whereas its standard deviation can be used to construct `confidence intervals', which enable us to perform uncertainty quantification (UQ) with regard to the training process of neural networks. We utilize representative neural-network-based function approximation tasks for the following cases: (i) a two-dimensional circular-cylinder wake; (ii) the DayMET dataset (maximum daily temperature in North America); (iii) a three-dimensional square-cylinder wake; and (iv) urban flow, to assess the generalizability of the present idea for a wide range of complex datasets. SWAG-based UQ can be applied regardless of the network architecture, and therefore, we demonstrate the applicability of the method for two types of neural networks: (i) global field reconstruction from sparse sensors by combining convolutional neural network (CNN) and multi-layer perceptron (MLP); and (ii) far-field state estimation from sectional data with two-dimensional CNN. We find that SWAG can obtain physically-interpretable confidence-interval estimates from the perspective of model-form uncertainty. This capability supports its use for a wide range of problems in science and engineering.
△ Less
Submitted 14 July, 2022; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Data-Driven Wind Turbine Wake Modeling via Probabilistic Machine Learning
Authors:
S. Ashwin Renganathan,
Romit Maulik,
Stefano Letizia,
Giacomo Valerio Iungo
Abstract:
Wind farm design primarily depends on the variability of the wind turbine wake flows to the atmospheric wind conditions, and the interaction between wakes. Physics-based models that capture the wake flow-field with high-fidelity are computationally very expensive to perform layout optimization of wind farms, and, thus, data-driven reduced order models can represent an efficient alternative for sim…
▽ More
Wind farm design primarily depends on the variability of the wind turbine wake flows to the atmospheric wind conditions, and the interaction between wakes. Physics-based models that capture the wake flow-field with high-fidelity are computationally very expensive to perform layout optimization of wind farms, and, thus, data-driven reduced order models can represent an efficient alternative for simulating wind farms. In this work, we use real-world light detection and ranging (LiDAR) measurements of wind-turbine wakes to construct predictive surrogate models using machine learning. Specifically, we first demonstrate the use of deep autoencoders to find a low-dimensional \emph{latent} space that gives a computationally tractable approximation of the wake LiDAR measurements. Then, we learn the mapping between the parameter space and the (latent space) wake flow-fields using a deep neural network. Additionally, we also demonstrate the use of a probabilistic machine learning technique, namely, Gaussian process modeling, to learn the parameter-space-latent-space mapping in addition to the epistemic and aleatoric uncertainty in the data. Finally, to cope with training large datasets, we demonstrate the use of variational Gaussian process models that provide a tractable alternative to the conventional Gaussian process models for large datasets. Furthermore, we introduce the use of active learning to adaptively build and improve a conventional Gaussian process model predictive capability. Overall, we find that our approach provides accurate approximations of the wind-turbine wake flow field that can be queried at an orders-of-magnitude cheaper cost than those generated with high-fidelity physics-based simulations.
△ Less
Submitted 16 November, 2021; v1 submitted 6 September, 2021;
originally announced September 2021.
-
PyParSVD: A streaming, distributed and randomized singular-value-decomposition library
Authors:
Romit Maulik,
Gianmarco Mengaldo
Abstract:
We introduce PyParSVD\footnote{https://github.com/Romit-Maulik/PyParSVD}, a Python library that implements a streaming, distributed and randomized algorithm for the singular value decomposition. To demonstrate its effectiveness, we extract coherent structures from scientific data. Futhermore, we show weak scaling assessments on up to 256 nodes of the Theta machine at Argonne Leadership Computing F…
▽ More
We introduce PyParSVD\footnote{https://github.com/Romit-Maulik/PyParSVD}, a Python library that implements a streaming, distributed and randomized algorithm for the singular value decomposition. To demonstrate its effectiveness, we extract coherent structures from scientific data. Futhermore, we show weak scaling assessments on up to 256 nodes of the Theta machine at Argonne Leadership Computing Facility, demonstrating potential for large-scale data analyses of practical data sets.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
Learning the temporal evolution of multivariate densities via normalizing flows
Authors:
Yubin Lu,
Romit Maulik,
Ting Gao,
Felix Dietrich,
Ioannis G. Kevrekidis,
Jinqiao Duan
Abstract:
In this work, we propose a method to learn multivariate probability distributions using sample path data from stochastic differential equations. Specifically, we consider temporally evolving probability distributions (e.g., those produced by integrating local or nonlocal Fokker-Planck equations). We analyze this evolution through machine learning assisted construction of a time-dependent mapping t…
▽ More
In this work, we propose a method to learn multivariate probability distributions using sample path data from stochastic differential equations. Specifically, we consider temporally evolving probability distributions (e.g., those produced by integrating local or nonlocal Fokker-Planck equations). We analyze this evolution through machine learning assisted construction of a time-dependent mapping that takes a reference distribution (say, a Gaussian) to each and every instance of our evolving distribution. If the reference distribution is the initial condition of a Fokker-Planck equation, what we learn is the time-T map of the corresponding solution. Specifically, the learned map is a multivariate normalizing flow that deforms the support of the reference density to the support of each and every density snapshot in time. We demonstrate that this approach can approximate probability density function evolutions in time from observed sampled data for systems driven by both Brownian and Lévy noise. We present examples with two- and three-dimensional, uni- and multimodal distributions to validate the method.
△ Less
Submitted 3 May, 2022; v1 submitted 29 July, 2021;
originally announced July 2021.
-
PythonFOAM: In-situ data analyses with OpenFOAM and Python
Authors:
Romit Maulik,
Dimitrios Fytanidis,
Bethany Lusch,
Venkatram Vishwanath,
Saumil Patel
Abstract:
We outline the development of a general-purpose Python-based data analysis tool for OpenFOAM. Our implementation relies on the construction of OpenFOAM applications that have bindings to data analysis libraries in Python. Double precision data in OpenFOAM is cast to a NumPy array using the NumPy C-API and Python modules may then be used for arbitrary data analysis and manipulation on flow-field in…
▽ More
We outline the development of a general-purpose Python-based data analysis tool for OpenFOAM. Our implementation relies on the construction of OpenFOAM applications that have bindings to data analysis libraries in Python. Double precision data in OpenFOAM is cast to a NumPy array using the NumPy C-API and Python modules may then be used for arbitrary data analysis and manipulation on flow-field information. We highlight how the proposed wrapper may be used for an in-situ online singular value decomposition (SVD) implemented in Python and accessed from the OpenFOAM solver PimpleFOAM. Here, `in-situ' refers to a programming paradigm that allows for a concurrent computation of the data analysis on the same computational resources utilized for the partial differential equation solver. In addition, to demonstrate data-parallel analyses, we deploy a distributed SVD, which collects snapshot data across the ranks of a distributed simulation to compute the global left singular vectors. Crucially, both OpenFOAM and Python share the same message passing interface (MPI) communicator for this deployment which allows Python objects and functions to exchange NumPy arrays across ranks. Subsequently, we provide scaling assessments of this distributed SVD on multiple nodes of Intel Broadwell and KNL architectures for canonical test cases such as the large eddy simulations of a backward facing step and a channel flow at friction Reynolds number of 395. Finally, we demonstrate the deployment of a deep neural network for compressing the flow-field information using an autoencoder to demonstrate an ability to use state-of-the-art machine learning tools in the Python ecosystem.
△ Less
Submitted 12 August, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Global field reconstruction from sparse sensors with Voronoi tessellation-assisted deep learning
Authors:
Kai Fukami,
Romit Maulik,
Nesar Ramachandra,
Koji Fukagata,
Kunihiko Taira
Abstract:
Achieving accurate and robust global situational awareness of a complex time-evolving field from a limited number of sensors has been a longstanding challenge. This reconstruction problem is especially difficult when sensors are sparsely positioned in a seemingly random or unorganized manner, which is often encountered in a range of scientific and engineering problems. Moreover, these sensors can…
▽ More
Achieving accurate and robust global situational awareness of a complex time-evolving field from a limited number of sensors has been a longstanding challenge. This reconstruction problem is especially difficult when sensors are sparsely positioned in a seemingly random or unorganized manner, which is often encountered in a range of scientific and engineering problems. Moreover, these sensors can be in motion and can become online or offline over time. The key leverage in addressing this scientific issue is the wealth of data accumulated from the sensors. As a solution to this problem, we propose a data-driven spatial field recovery technique founded on a structured grid-based deep-learning approach for arbitrary positioned sensors of any numbers. It should be noted that the naïve use of machine learning becomes prohibitively expensive for global field reconstruction and is furthermore not adaptable to an arbitrary number of sensors. In the present work, we consider the use of Voronoi tessellation to obtain a structured-grid representation from sensor locations enabling the computationally tractable use of convolutional neural networks. One of the central features of the present method is its compatibility with deep-learning based super-resolution reconstruction techniques for structured sensor data that are established for image processing. The proposed reconstruction technique is demonstrated for unsteady wake flow, geophysical data, and three-dimensional turbulence. The current framework is able to handle an arbitrary number of moving sensors, and thereby overcomes a major limitation with existing reconstruction methods. The presented technique opens a new pathway towards the practical use of neural networks for real-time global field estimation.
△ Less
Submitted 22 July, 2021; v1 submitted 2 January, 2021;
originally announced January 2021.
-
Deploying deep learning in OpenFOAM with TensorFlow
Authors:
Romit Maulik,
Himanshu Sharma,
Saumil Patel,
Bethany Lusch,
Elise Jennings
Abstract:
We outline the development of a data science module within OpenFOAM which allows for the in-situ deployment of trained deep learning architectures for general-purpose predictive tasks. This module is constructed with the TensorFlow C API and is integrated into OpenFOAM as an application that may be linked at run time. Notably, our formulation precludes any restrictions related to the type of neura…
▽ More
We outline the development of a data science module within OpenFOAM which allows for the in-situ deployment of trained deep learning architectures for general-purpose predictive tasks. This module is constructed with the TensorFlow C API and is integrated into OpenFOAM as an application that may be linked at run time. Notably, our formulation precludes any restrictions related to the type of neural network architecture (i.e., convolutional, fully-connected, etc.). This allows for potential studies of complicated neural architectures for practical CFD problems. In addition, the proposed module outlines a path towards an open-source, unified and transparent framework for computational fluid dynamics and machine learning.
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Meta-modeling strategy for data-driven forecasting
Authors:
Dominic J. Skinner,
Romit Maulik
Abstract:
Accurately forecasting the weather is a key requirement for climate change mitigation. Data-driven methods offer the ability to make more accurate forecasts, but lack interpretability and can be expensive to train and deploy if models are not carefully developed. Here, we make use of two historical climate data sets and tools from machine learning, to accurately predict temperature fields. Further…
▽ More
Accurately forecasting the weather is a key requirement for climate change mitigation. Data-driven methods offer the ability to make more accurate forecasts, but lack interpretability and can be expensive to train and deploy if models are not carefully developed. Here, we make use of two historical climate data sets and tools from machine learning, to accurately predict temperature fields. Furthermore, we are able to use low fidelity models that are cheap to train and evaluate, to selectively avoid expensive high fidelity function evaluations, as well as uncover seasonal variations in predictive power. This allows for an adaptive training strategy for computationally efficient geophysical emulation.
△ Less
Submitted 14 November, 2020;
originally announced December 2020.
-
Latent-space time evolution of non-intrusive reduced-order models using Gaussian process emulation
Authors:
Romit Maulik,
Themistoklis Botsas,
Nesar Ramachandra,
Lachlan Robert Mason,
Indranil Pan
Abstract:
Non-intrusive reduced-order models (ROMs) have recently generated considerable interest for constructing computationally efficient counterparts of nonlinear dynamical systems emerging from various domain sciences. They provide a low-dimensional emulation framework for systems that may be intrinsically high-dimensional. This is accomplished by utilizing a construction algorithm that is purely data-…
▽ More
Non-intrusive reduced-order models (ROMs) have recently generated considerable interest for constructing computationally efficient counterparts of nonlinear dynamical systems emerging from various domain sciences. They provide a low-dimensional emulation framework for systems that may be intrinsically high-dimensional. This is accomplished by utilizing a construction algorithm that is purely data-driven. It is no surprise, therefore, that the algorithmic advances of machine learning have led to non-intrusive ROMs with greater accuracy and computational gains. However, in bypassing the utilization of an equation-based evolution, it is often seen that the interpretability of the ROM framework suffers. This becomes more problematic when black-box deep learning methods are used which are notorious for lacking robustness outside the physical regime of the observed data. In this article, we propose the use of a novel latent-space interpolation algorithm based on Gaussian process regression. Notably, this reduced-order evolution of the system is parameterized by control parameters to allow for interpolation in space. The use of this procedure also allows for a continuous interpretation of time which allows for temporal interpolation. The latter aspect provides information, with quantified uncertainty, about full-state evolution at a finer resolution than that utilized for training the ROMs. We assess the viability of this algorithm for an advection-dominated system given by the inviscid shallow water equations.
△ Less
Submitted 15 October, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Site-specific graph neural network for predicting protonation energy of oxygenate molecules
Authors:
Romit Maulik,
Rajeev Surendran Array,
Prasanna Balaprakash
Abstract:
Bio-oil molecule assessment is essential for the sustainable development of chemicals and transportation fuels. These oxygenated molecules have adequate carbon, hydrogen, and oxygen atoms that can be used for developing new value-added molecules (chemicals or transportation fuels). One motivation for our study stems from the fact that a liquid phase upgrading using mineral acid is a cost-effective…
▽ More
Bio-oil molecule assessment is essential for the sustainable development of chemicals and transportation fuels. These oxygenated molecules have adequate carbon, hydrogen, and oxygen atoms that can be used for developing new value-added molecules (chemicals or transportation fuels). One motivation for our study stems from the fact that a liquid phase upgrading using mineral acid is a cost-effective chemical transformation. In this chemical upgrading process, adding a proton (positively charged atomic hydrogen) to an oxygen atom is a central step. The protonation energies of oxygen atoms in a molecule determine the thermodynamic feasibility of the reaction and likely chemical reaction pathway. A quantum chemical model based on coupled cluster theory is used to compute accurate thermochemical properties such as the protonation energies of oxygen atoms and the feasibility of protonation-based chemical transformations. However, this method is too computationally expensive to explore a large space of chemical transformations. We develop a graph neural network approach for predicting protonation energies of oxygen atoms of hundreds of bioxygenate molecules to predict the feasibility of aqueous acidic reactions. Our approach relies on an iterative local nonlinear embedding that gradually leads to global influence of distant atoms and a output layer that predicts the protonation energy. Our approach is geared to site-specific predictions for individual oxygen atoms of a molecule in comparison with commonly used graph convolutional networks that focus on a singular molecular property prediction. We demonstrate that our approach is effective in learning the location and magnitudes of protonation energies of oxygenated molecules.
△ Less
Submitted 18 September, 2019;
originally announced January 2020.
-
Using recurrent neural networks for nonlinear component computation in advection-dominated reduced-order models
Authors:
Romit Maulik,
Vishwas Rao,
Sandeep Madireddy,
Bethany Lusch,
Prasanna Balaprakash
Abstract:
Rapid simulations of advection-dominated problems are vital for multiple engineering and geophysical applications. In this paper, we present a long short-term memory neural network to approximate the nonlinear component of the reduced-order model (ROM) of an advection-dominated partial differential equation. This is motivated by the fact that the nonlinear term is the most expensive component of a…
▽ More
Rapid simulations of advection-dominated problems are vital for multiple engineering and geophysical applications. In this paper, we present a long short-term memory neural network to approximate the nonlinear component of the reduced-order model (ROM) of an advection-dominated partial differential equation. This is motivated by the fact that the nonlinear term is the most expensive component of a successful ROM. For our approach, we utilize a Galerkin projection to isolate the linear and the transient components of the dynamical system and then use discrete empirical interpolation to generate training data for supervised learning. We note that the numerical time-advancement and linear-term computation of the system ensure a greater preservation of physics than does a process that is fully modeled. Our results show that the proposed framework recovers transient dynamics accurately without nonlinear term computations in full-order space and represents a cost-effective alternative to solely equation-based ROMs.
△ Less
Submitted 1 November, 2019; v1 submitted 18 September, 2019;
originally announced September 2019.