Search | arXiv e-print repository

Data-Driven Turbulence Modeling Approach for Cold-Wall Hypersonic Boundary Layers

Authors: Muhammad I. Zafar, Xuhui Zhou, Christopher J. Roy, David Stelter, Heng Xiao

Abstract: Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence mo… ▽ More Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence model is used to provide closure mapping from mean flow quantities to Reynolds stress as well as a variable turbulent Prandtl number. Sparse observation data of velocity and temperature are used to train the turbulence model. This approach is analyzed using direct numerical simulation database for zero-pressure gradient (ZPG) boundary layer flows over a flat plate with a Mach number between 6 and 14 and wall-to-recovery temperature ratios ranging from 0.18 to 0.76. Two training cases are conducted: 1) a single training case with observation data from one flow case, 2) a joint training case where data from two flow cases are simultaneously used for training. Trained models are also tested for generalizability on the remaining flow cases in each of the training cases. The results are also analyzed for insights to inform the future work towards enhancing the generalizability of the learned turbulence model. △ Less

Submitted 16 April, 2025; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2312.11842 [pdf, other]

Neural operator-based super-fidelity: A warm-start approach for accelerating steady-state simulations

Authors: Xu-Hui Zhou, Jiequn Han, Muhammad I. Zafar, Eric M. Wolf, Christopher R. Schrock, Christopher J. Roy, Heng Xiao

Abstract: Recently, the use of neural networks to accelerate the solving of partial differential equations (PDEs) has gained significant traction in both academia and industry. However, employing neural networks as standalone surrogate models raises concerns about solution reliability, especially in precision-critical scientific tasks. This study introduces a novel "super-fidelity" method that leverages neu… ▽ More Recently, the use of neural networks to accelerate the solving of partial differential equations (PDEs) has gained significant traction in both academia and industry. However, employing neural networks as standalone surrogate models raises concerns about solution reliability, especially in precision-critical scientific tasks. This study introduces a novel "super-fidelity" method that leverages neural networks for warm-starting steady-state PDE solvers, ensuring both efficiency and accuracy. Inspired by super-resolution techniques in computer vision, this method maps low-fidelity solutions to high-fidelity targets using a vector-cloud neural network with equivariance (VCNN-e), a neural operator that preserves all necessary invariance and equivariance properties for scalar and vector predictions while seamlessly adapting to different spatial discretizations. We evaluated this approach in three scenarios: (1) a weakly nonlinear case involving low Reynolds number flows around elliptical cylinders, (2) a strongly nonlinear case with high Reynolds number flows over airfoils, and (3) a practical case with high Reynolds number flows over a wing. In all cases, the neural operator-based initialization accelerated convergence by at least two-fold compared to traditional methods, without sacrificing accuracy. The method's robustness and scalability are further demonstrated across different linear equation solvers and multi-process computing configurations. It also achieves overall time savings in scenarios with multiple simulations, even when accounting for model development time. Overall, our approach provides an effective means to accelerate steady-state PDE solutions using neural operators, maintaining high accuracy while significantly improving computational efficiency, particularly in precision-driven scientific applications. △ Less

Submitted 26 February, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2306.14011 [pdf, other]

Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance

Authors: Weicheng Xue, Christohper John Roy

Abstract: Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully conne… ▽ More Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully connected neural networks as the underlying machine learning model, with the tuning parameters as inputs to the neural networks and the actual execution time of a simulation as the outputs. To assess the effectiveness of our autotuning approach, we conducted experiments on three different types of GPUs, with computational speeds ranging from low to high. We performed independent training for each GPU model and also explored combined training across multiple GPU models. By leveraging artificial neural networks, our autotuning technique achieved remarkable results in tuning a wide range of parameters, leading to enhanced performance for a CFD code. Importantly, our approach demonstrated its efficacy while requiring only a small fraction of samples from the large parameter search space. This efficiency is attributed to the effectiveness of the fully connected neural networks in capturing the complex relationships between the parameter settings and the resulting performance. Overall, our study showcases the potential of machine learning, specifically fully connected neural networks, in autotuning GPU-accelerated CFD codes. By leveraging this approach, researchers and practitioners can achieve high performance in scientific simulations with optimized parameter configurations. △ Less

Submitted 20 February, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2305.18057 [pdf, other]

CPU-GPU Heterogeneous Code Acceleration of a Finite Volume Computational Fluid Dynamics Solver

Authors: Weicheng Xue, Hongyu Wang, Christopher J. Roy

Abstract: This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing… ▽ More This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing requiring ghost cell exchange is proposed to help estimate the performance of the heterogeneous implementation. The scaling performance of the CPU-GPU heterogeneous computing and its comparison with the pure multi-CPU/GPU performance for a supersonic inlet test case is presented to display the advantages of leveraging the computational power of both the CPU and the GPU. Using CPUs and GPUs as workers together, the performance can be improved further compared to using pure CPUs or GPUs, and the advantages can be fairly estimated by the performance model proposed in this work. Finally, conclusions are drawn to provide 1) suggestions for application users who have an interest to leverage the computational power of the CPU and GPU to accelerate their own scientific computing simulations and 2) feedback for hardware architects who have an interest to design a better CPU-GPU heterogeneous system for heterogeneous computing. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2012.02925 [pdf, other]

doi 10.1016/j.jpdc.2021.05.010

An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC

Authors: Weicheng Xue, Charles W. Jackson, Christoper J. Roy

Abstract: This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling… ▽ More This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling for the multi-block CFD code are addressed by applying various optimizations. Performance optimizations such as the pack/unpack message method, removing temporary arrays as arguments to procedure calls, allocating global memory for limiters and connected boundary data, reordering non-blocking MPI I\_send/I\_recv and Wait calls, reducing unnecessary implicit derived type member data movement between the host and the device and the use of GPUDirect can improve the compute utilization, memory throughput, and asynchronous progression in the multi-block CFD code using modern programming features. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 43 pages, 27 figures

arXiv:2006.02602 [pdf, other]

doi 10.1002/cpe.6036

Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms

Authors: Weicheng Xue, Christopher J. Roy

Abstract: This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs… ▽ More This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs due to the noncontiguous memory access. The performance using whatever decompositions can be benefited from a series of performance optimizations in the paper. Since the buoyancy driven cavity code is latency-bounded on the clusters examined, a series of optimizations both agnostic and tailored to the platforms are designed to reduce the latency cost and improve memory throughput between hosts and devices efficiently. First, the parallel message packing/unpacking strategy developed for noncontiguous data movement between hosts and devices improves the overall performance by about a factor of 2. Second, transferring different data based on the stencil sizes for different variables further reduces the communication overhead. These two optimizations are general enough to be beneficial to stencil computations having ghost changes on all of the clusters tested. Third, GPUDirect is used to improve the communication on clusters which have the hardware and software support for direct communication between GPUs without staging CPU's memory. Finally, overlapping the communication and computations is shown to be not efficient on multi-GPUs if only using MPI or MPI+OpenACC. Although we believe our implementation has revealed enough overlap, the actual running does not utilize the overlap well due to a lack of asynchronous progression. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:1907.05591 [pdf]

Strain distribution and thermal strain relaxation in MOVPE grown hBN films on sapphire substrates

Authors: Kousik Bera, D. Chugh, Atanu Patra, H. Hoe Tan, C. Jagadish Anushree Roy

Abstract: Recently, hexagonal boron nitride (hBN) layers have generated a lot of interest as ideal substrates for 2D stacked devices. Sapphire-supported thin hBN films of different thicknesses are grown using metalorganic vapour phase epitaxy technique by following a flow modulation scheme. Though these films of relatively large size are potential candidates to be employed in designing real devices, they ex… ▽ More Recently, hexagonal boron nitride (hBN) layers have generated a lot of interest as ideal substrates for 2D stacked devices. Sapphire-supported thin hBN films of different thicknesses are grown using metalorganic vapour phase epitaxy technique by following a flow modulation scheme. Though these films of relatively large size are potential candidates to be employed in designing real devices, they exhibit wrinkling. The formation of wrinkles is a key signature of strain distribution in a film. Raman imaging has been utilized to study the residual strain distribution in these wrinkled hBN films. An increase in the overall compressive strain in the films with an increase in the layer thickness has been observed. To find whether the residual lattice strain in the films can be removed by a thermal treatment, temperature dependent Raman measurements of these films are carried out. The study demonstrates that the thermal rate of strain evolution is higher in the films of lower thickness than in the thicker films. This observation further provides a possible explanation for the variation of strain in the as-grown films. An empirical relation has been proposed for estimating the residual strain from the morphology of the films. We have also shown that the residual strain can be partially released by the delamination of the films. △ Less

Submitted 12 July, 2019; originally announced July 2019.

Comments: 27 pages, 10 figures

arXiv:1607.06834 [pdf, other]

doi 10.1016/j.compfluid.2017.09.014

A Numerical Investigation of Matrix-Free Implicit Time-Stepping Methods for Large CFD Simulations

Authors: Arash Sarshar, Paul Tranquilli, Brent Pickering, Andrew McCall, Adrian Sandu, Christopher J. Roy

Abstract: This paper is concerned with the development and testing of advanced time-stepping methods suited for the integration of time-accurate, real-world applications of computational fluid dynamics (CFD). The performance of several time discretization methods is studied numerically with regards to computational efficiency, order of accuracy, and stability, as well as the ability to treat effectively sti… ▽ More This paper is concerned with the development and testing of advanced time-stepping methods suited for the integration of time-accurate, real-world applications of computational fluid dynamics (CFD). The performance of several time discretization methods is studied numerically with regards to computational efficiency, order of accuracy, and stability, as well as the ability to treat effectively stiff problems. We consider matrix-free implementations, a popular approach for time-stepping methods applied to large CFD applications due to its adherence to scalable matrix-vector operations and a small memory footprint. We compare explicit methods with matrix-free implementations of implicit, linearly-implicit, as well as Rosenbrock-Krylov methods. We show that Rosenbrock-Krylov methods are competitive with existing techniques excelling for a number of problem types and settings. △ Less

Submitted 30 September, 2017; v1 submitted 22 July, 2016; originally announced July 2016.

Report number: Computational Science Lab CSL-TR-16-6 MSC Class: 65L05; 65L06; 65L20

Journal ref: Computers & Fluids, Volume 159, 15 Dec. 2017, PP. 53-63

arXiv:1511.02188 [pdf, other]

Efficient Functional-Based Adaptation for CFD Applications

Authors: William C. Tyson, Christopher J. Roy

Abstract: Adjoint methods have gained popularity in recent years for driving adaptation procedures which aim to reduce error in solution functionals. While adjoint methods have been proven effective for functional-based adaptation, the practical implementation of an adjoint method can be quite burdensome since code developers constantly need to ensure and maintain a dual consistent discretization as updates… ▽ More Adjoint methods have gained popularity in recent years for driving adaptation procedures which aim to reduce error in solution functionals. While adjoint methods have been proven effective for functional-based adaptation, the practical implementation of an adjoint method can be quite burdensome since code developers constantly need to ensure and maintain a dual consistent discretization as updates are made. Also, since most engineering problems consider multiple functionals, an adjoint solution must be obtained for each functional of interest which can increase the overall computational cost significantly. In this paper, an alternative to adjoints is presented which uses a sparse approximate inverse of the Jacobian of the residual to obtain approximate adjoint sensitivities for functional-based adaptation indicators. Since the approximate inverse need only be computed once, it can be recycled for any number of functionals making the new approach more efficient than a conventional adjoint method. This new method for functional-based adaptation will be tested using the quasi-1D nozzle problem, and results are presented for functionals of integrated pressure and entropy. △ Less

Submitted 6 November, 2015; originally announced November 2015.

arXiv:1508.06315 [pdf, other]

doi 10.1016/j.jcp.2016.07.038

Quantifying and Reducing Model-Form Uncertainties in Reynolds-Averaged Navier-Stokes Simulations: A Data-Driven, Physics-Based Bayesian Approach

Authors: H. Xiao, J. -L. Wu, J. -X. Wang, R. Sun, C. J. Roy

Abstract: Despite their well-known limitations, Reynolds-Averaged Navier-Stokes (RANS) models are still the workhorse tools for turbulent flow simulations in today's engineering application. For many practical flows, the turbulence models are by far the largest source of uncertainty. In this work we develop an open-box, physics-informed Bayesian framework for quantifying model-form uncertainties in RANS sim… ▽ More Despite their well-known limitations, Reynolds-Averaged Navier-Stokes (RANS) models are still the workhorse tools for turbulent flow simulations in today's engineering application. For many practical flows, the turbulence models are by far the largest source of uncertainty. In this work we develop an open-box, physics-informed Bayesian framework for quantifying model-form uncertainties in RANS simulations. Uncertainties are introduced directly to the Reynolds stresses and are represented with compact parameterization accounting for empirical prior knowledge and physical constraints (e.g., realizability, smoothness, and symmetry). An iterative ensemble Kalman method is used to assimilate the prior knowledge and observation data in a Bayesian framework, and to propagate them to posterior distributions of velocities and other Quantities of Interest (QoIs). We use two representative cases, the flow over periodic hills and the flow in a square duct, to evaluate the performance of the proposed framework. Simulation results suggest that, even with very sparse observations, the posterior mean velocities and other QoIs have significantly better agreement with the benchmark data compared to the baseline results. At most locations the posterior distribution adequately captures the true model error within the developed model form uncertainty bounds. The framework is a major improvement over existing black-box, physics-neutral methods for model-form uncertainty quantification, where prior knowledge and details of the models are not exploited. This approach has potential implications in many fields in which the governing equations are well understood but the model uncertainty comes from unresolved physical processes. △ Less

Submitted 8 December, 2016; v1 submitted 25 August, 2015; originally announced August 2015.

Comments: 53 pages, 15 figures

MSC Class: 76F99

Journal ref: Journal of Computational Physics 324 (2016): 115-136

arXiv:1501.03189 [pdf, other]

doi 10.1115/1.4037452

Propagation of Input Uncertainty in Presence of Model-Form Uncertainty: A Multi-fidelity Approach for CFD Applications

Authors: Jian-xun Wang, Christopher J. Roy, Heng Xiao

Abstract: Proper quantification and propagation of uncertainties in computational simulations are of critical importance. This issue is especially challenging for CFD applications. A particular obstacle for uncertainty quantifications in CFD problems is the large model discrepancies associated with the CFD models used for uncertainty propagation. Neglecting or improperly representing the model discrepancies… ▽ More Proper quantification and propagation of uncertainties in computational simulations are of critical importance. This issue is especially challenging for CFD applications. A particular obstacle for uncertainty quantifications in CFD problems is the large model discrepancies associated with the CFD models used for uncertainty propagation. Neglecting or improperly representing the model discrepancies leads to inaccurate and distorted uncertainty distribution for the Quantities of Interest. High-fidelity models, being accurate yet expensive, can accommodate only a small ensemble of simulations and thus lead to large interpolation errors and/or sampling errors; low-fidelity models can propagate a large ensemble, but can introduce large modeling errors. In this work, we propose a multi-model strategy to account for the influences of model discrepancies in uncertainty propagation and to reduce their impact on the predictions. Specifically, we take advantage of CFD models of multiple fidelities to estimate the model discrepancies associated with the lower-fidelity model in the parameter space. A Gaussian process is adopted to construct the model discrepancy function, and a Bayesian approach is used to infer the discrepancies and corresponding uncertainties in the regions of the parameter space where the high-fidelity simulations are not performed. The proposed multi-model strategy combines information from models with different fidelities and computational costs, and is of particular relevance for CFD applications, where a hierarchy of models with a wide range of complexities exists. Several examples of relevance to CFD applications are performed to demonstrate the merits of the proposed strategy. Simulation results suggest that, by combining low- and high-fidelity models, the proposed approach produces better results than what either model can achieve individually. △ Less

Submitted 27 March, 2017; v1 submitted 13 January, 2015; originally announced January 2015.

Comments: 18 pages, 8 figures

Journal ref: ASME Journal of Risk and Uncertainty Part B 4(1), 011002, 2018

Showing 1–11 of 11 results for author: Roy, C J