Search | arXiv e-print repository

Transformations of Computational Meshes

Abstract: Computational meshes, as a way to partition space, form the basis of much of PDE simulation technology, for instance for the finite element and finite volume discretization methods. In complex simulations, we are often driven to modify an input mesh, for example, to refine, coarsen, extrude, change cell types, or filter it. Mesh manipulation code can be voluminous, error-prone, spread over many sp… ▽ More Computational meshes, as a way to partition space, form the basis of much of PDE simulation technology, for instance for the finite element and finite volume discretization methods. In complex simulations, we are often driven to modify an input mesh, for example, to refine, coarsen, extrude, change cell types, or filter it. Mesh manipulation code can be voluminous, error-prone, spread over many special cases, and hard to understand and maintain by subsequent developers. We present a simple, table-driven paradigm for mesh transformation which can execute a large variety of transformations in a performant, parallel manner, along with experiments in the open source library PETSc which can be run by the reader. △ Less

Submitted 19 June, 2025; originally announced June 2025.

Comments: 12 pages, 8 figures

arXiv:2401.05868 [pdf, other]

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

Authors: David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama

Abstract: In this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-pro… ▽ More In this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-processing on the process count appropriate to the given phase of the simulation and other conditions. For demonstration, we implemented this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific Computation, and added a convenient high-level interface into Firedrake, a system for solving partial differential equations using finite element methods. We evaluated our new implementation by saving and loading data involving 8.2 billion finite element degrees of freedom using 8,192 parallel processes on ARCHER2, the UK National Supercomputing Service. △ Less

Submitted 30 October, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: author accepted manuscript

arXiv:2303.12620 [pdf, other]

A Numerical Study of Landau Damping with PETSc-PIC

Authors: Daniel S. Finn, Matthew G. Knepley, Joseph V. Pusztay, Mark F. Adams

Abstract: We present a study of the standard plasma physics test, Landau damping, using the Particle-In-Cell (PIC) algorithm. The Landau damping phenomenon consists of the damping of small oscillations in plasmas without collisions. In the PIC method, a hybrid discretization is constructed with a grid of finitely supported basis functions to represent the electric, magnetic and/or gravitational fields, and… ▽ More We present a study of the standard plasma physics test, Landau damping, using the Particle-In-Cell (PIC) algorithm. The Landau damping phenomenon consists of the damping of small oscillations in plasmas without collisions. In the PIC method, a hybrid discretization is constructed with a grid of finitely supported basis functions to represent the electric, magnetic and/or gravitational fields, and a distribution of delta functions to represent the particle field. Approximations to the dispersion relation are found to be inadequate in accurately calculating values for the electric field frequency and damping rate when parameters of the physical system, such as the plasma frequency or thermal velocity, are varied. We present a full derivation and numerical solution for the dispersion relation, and verify the PETSC-PIC numerical solutions to the Vlasov-Poisson for a large range of wave numbers and charge densities. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 14 pages, 7 figures

arXiv:2208.07128 [pdf, other]

Tetrahedralization of a Hexahedral Mesh

Authors: Aman Timalsina, Matthew G. Knepley

Abstract: Two important classes of three-dimensional elements in computational meshes are hexahedra and tetrahedra. While several efficient methods exist that convert a hexahedral element to a tetrahedral elements, the existing algorithm for tetrahedralization of a hexahedral complex is the marching tetrahedron algorithm which limits pre-selection of face divisions. We generalize a procedure for tetrahedral… ▽ More Two important classes of three-dimensional elements in computational meshes are hexahedra and tetrahedra. While several efficient methods exist that convert a hexahedral element to a tetrahedral elements, the existing algorithm for tetrahedralization of a hexahedral complex is the marching tetrahedron algorithm which limits pre-selection of face divisions. We generalize a procedure for tetrahedralizing triangular prisms to tetrahedralizing cubes, and combine it with certain heuristics to design an algorithm that can triangulate any hexahedra. △ Less

Submitted 19 January, 2023; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: The previous version had an error in the proof of Observation 2.1, which has since been rectified in this version. Formatting and title changed

arXiv:2201.02806 [pdf, other]

Parallel Metric-Based Mesh Adaptation in PETSc using ParMmg

Authors: Joseph G. Wallwork, Matthew G. Knepley, Nicolas Barral, Matthew D. Piggott

Abstract: This research note documents the integration of the MPI-parallel metric-based mesh adaptation toolkit ParMmg into the solver library PETSc. This coupling brings robust, scalable anisotropic mesh adaptation to a wide community of PETSc users, as well as users of downstream packages. We demonstrate the new functionality via the solution of Poisson problems in three dimensions, with both uniform and… ▽ More This research note documents the integration of the MPI-parallel metric-based mesh adaptation toolkit ParMmg into the solver library PETSc. This coupling brings robust, scalable anisotropic mesh adaptation to a wide community of PETSc users, as well as users of downstream packages. We demonstrate the new functionality via the solution of Poisson problems in three dimensions, with both uniform and spatially-varying right-hand sides. △ Less

Submitted 27 July, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: 5 pages, 2 figures. Appeared as a research note in the 30th International Meshing Roundtable

MSC Class: 35-04 ACM Class: G.4

arXiv:2103.12067 [pdf, other]

doi 10.1177/1094342020966835

Understanding performance variability in standard and pipelined parallel Krylov solvers

Authors: Hannah Morgan, Patrick Sanan, Matthew G. Knepley, Richard Tran Mills

Abstract: In this work, we collect data from runs of Krylov subspace methods and pipelined Krylov algorithms in an effort to understand and model the impact of machine noise and other sources of variability on performance. We find large variability of Krylov iterations between compute nodes for standard methods that is reduced in pipelined algorithms, directly supporting conjecture, as well as large variati… ▽ More In this work, we collect data from runs of Krylov subspace methods and pipelined Krylov algorithms in an effort to understand and model the impact of machine noise and other sources of variability on performance. We find large variability of Krylov iterations between compute nodes for standard methods that is reduced in pipelined algorithms, directly supporting conjecture, as well as large variation between statistical distributions of runtimes across iterations. Based on these results, we improve upon a previously introduced nondeterministic performance model by allowing iterations to fluctuate over time. We present our data from runs of various Krylov algorithms across multiple platforms as well as our updated non-stationary model that provides good agreement with observations. We also suggest how it can be used as a predictive tool. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 18 pages, 12 figures

Journal ref: IJHPCA, 35(1), 2020

arXiv:2004.08729 [pdf, other]

doi 10.1137/20M1332748

Fully Parallel Mesh I/O using PETSc DMPlex with an Application to Waveform Modeling

Authors: Vaclav Hapla, Matthew G. Knepley, Michael Afanasiev, Christian Boehm, Martin van Driel, Lion Krischer, Andreas Fichtner

Abstract: Large-scale PDE simulations using high-order finite-element methods on unstructured meshes are an indispensable tool in science and engineering. The widely used open-source PETSc library offers an efficient representation of generic unstructured meshes within its DMPlex module. This paper details our recent implementation of parallel mesh reading and topological interpolation (computation of edges… ▽ More Large-scale PDE simulations using high-order finite-element methods on unstructured meshes are an indispensable tool in science and engineering. The widely used open-source PETSc library offers an efficient representation of generic unstructured meshes within its DMPlex module. This paper details our recent implementation of parallel mesh reading and topological interpolation (computation of edges and faces from a cell-vertex mesh) into DMPlex. We apply these developments to seismic wave propagation scenarios on Mars as an example application. The principal motivation is to overcome single-node memory limits and reach mesh sizes which were impossible before. Moreover, we demonstrate that scalability of I/O and topological interpolation goes beyond 12'000 cores, and memory-imposed limits on mesh size vanish. △ Less

Submitted 15 September, 2020; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: 23 pages, 11 figures

MSC Class: 65-04; 65Y05; 65M50; 05C90; 35L05

Journal ref: SIAM J. Sci. Comput. 43 (2021) C127-C153

arXiv:1912.08516 [pdf, other]

doi 10.1145/3445791

PCPATCH: software for the topological construction of multigrid relaxation methods

Authors: Patrick E. Farrell, Matthew G. Knepley, Lawrence Mitchell, Florian Wechsung

Abstract: Effective relaxation methods are necessary for good multigrid convergence. For many equations, standard Jacobi and Gauß-Seidel are inadequate, and more sophisticated space decompositions are required; examples include problems with semidefinite terms or saddle point structure. In this paper we present a unifying software abstraction, PCPATCH, for the topological construction of space decomposition… ▽ More Effective relaxation methods are necessary for good multigrid convergence. For many equations, standard Jacobi and Gauß-Seidel are inadequate, and more sophisticated space decompositions are required; examples include problems with semidefinite terms or saddle point structure. In this paper we present a unifying software abstraction, PCPATCH, for the topological construction of space decompositions for multigrid relaxation methods. Space decompositions are specified by collecting topological entities in a mesh (such as all vertices or faces) and applying a construction rule (such as taking all degrees of freedom in the cells around each entity). The software is implemented in PETSc and facilitates the elegant expression of a wide range of schemes merely by varying solver options at runtime. In turn, this allows for the very rapid development of fast solvers for difficult problems. △ Less

Submitted 5 July, 2021; v1 submitted 18 December, 2019; originally announced December 2019.

Comments: 22 pages, minor fixes in bibliography

Journal ref: ACM Transactions on Mathematical Software 47(3):25 (2021)

arXiv:1809.00747 [pdf, other]

A high order hybridizable discontinuous Galerkin method for incompressible miscible displacement in heterogeneous media

Authors: Maurice S. Fabien, Matthew G. Knepley, Beatrice M. Riviere

Abstract: We present a new method for approximating solutions to the incompressible miscible displacement problem in porous media. At the discrete level, the coupled nonlinear system has been split into two linear systems that are solved sequentially. The method is based on a hybridizable discontinuous Galerkin method for the Darcy flow, which produces a mass--conservative flux approximation, and a hybridiz… ▽ More We present a new method for approximating solutions to the incompressible miscible displacement problem in porous media. At the discrete level, the coupled nonlinear system has been split into two linear systems that are solved sequentially. The method is based on a hybridizable discontinuous Galerkin method for the Darcy flow, which produces a mass--conservative flux approximation, and a hybridizable discontinuous Galerkin method for the transport equation. The resulting method is high order accurate. Due to the implicit treatment of the system of partial differential equations, we observe computationally that no slope limiters are needed. Numerical experiments are provided that show that the method converges optimally and is robust for highly heterogeneous porous media in 2D and 3D. △ Less

Submitted 16 September, 2018; v1 submitted 3 September, 2018; originally announced September 2018.

arXiv:1808.08328 [pdf, other]

doi 10.1016/j.jcp.2019.02.020

Composable block solvers for the four-field double porosity/permeability model

Authors: M. S. Joshaghani, J. Chang, K. B. Nakshatrala, M. G. Knepley

Abstract: The objective of this paper is twofold. First, we propose two composable block solver methodologies to solve the discrete systems that arise from finite element discretizations of the double porosity/permeability (DPP) model. The DPP model, which is a four-field mathematical model, describes the flow of a single-phase incompressible fluid in a porous medium with two distinct pore-networks and with… ▽ More The objective of this paper is twofold. First, we propose two composable block solver methodologies to solve the discrete systems that arise from finite element discretizations of the double porosity/permeability (DPP) model. The DPP model, which is a four-field mathematical model, describes the flow of a single-phase incompressible fluid in a porous medium with two distinct pore-networks and with a possibility of mass transfer between them. Using the composable solvers feature available in PETSc and the finite element libraries available under the Firedrake Project, we illustrate two different ways by which one can effectively precondition these large systems of equations. Second, we employ the recently developed performance model called the Time-Accuracy-Size (TAS) spectrum to demonstrate that the proposed composable block solvers are scalable in both the parallel and algorithmic sense. Moreover, we utilize this spectrum analysis to compare the performance of three different finite element discretizations (classical mixed formulation with H(div) elements, stabilized continuous Galerkin mixed formulation, and stabilized discontinuous Galerkin mixed formulation) for the DPP model. Our performance spectrum analysis demonstrates that the composable block solvers are fine choices for any of these three finite element discretizations. Sample computer codes are provided to illustrate how one can easily implement the proposed block solver methodologies through PETSc command line options. △ Less

Submitted 24 August, 2018; originally announced August 2018.

arXiv:1802.07832 [pdf, other]

Comparative study of finite element methods using the Time-Accuracy-Size (TAS) spectrum analysis

Authors: Justin Chang, Maurice S. Fabien, Matthew G. Knepley, Richard T. Mills

Abstract: We present a performance analysis appropriate for comparing algorithms using different numerical discretizations. By taking into account the total time-to-solution, numerical accuracy with respect to an error norm, and the computation rate, a cost-benefit analysis can be performed to determine which algorithm and discretization are particularly suited for an application. This work extends the perf… ▽ More We present a performance analysis appropriate for comparing algorithms using different numerical discretizations. By taking into account the total time-to-solution, numerical accuracy with respect to an error norm, and the computation rate, a cost-benefit analysis can be performed to determine which algorithm and discretization are particularly suited for an application. This work extends the performance spectrum model in Chang et. al. 2017 for interpretation of hardware and algorithmic tradeoffs in numerical PDE simulation. As a proof-of-concept, popular finite element software packages are used to illustrate this analysis for Poisson's equation. △ Less

Submitted 21 February, 2018; originally announced February 2018.

MSC Class: 65Y05; 65Y20; 68N99

arXiv:1802.06013 [pdf, other]

A hybridizable discontinuous Galerkin method for two-phase flow in heterogeneous porous media

Authors: Maurice S. Fabien, Matthew G. Knepley, Beatrice M. Riviere

Abstract: We present a new method for simulating incompressible immiscible two-phase flow in porous media. The semi-implicit method decouples the wetting phase pressure and saturation equations. The equations are discretized using a hybridizable discontinuous Galerkin (HDG) method. The proposed method is of high order, conserves global/local mass balance, and the number of globally coupled degrees of freedo… ▽ More We present a new method for simulating incompressible immiscible two-phase flow in porous media. The semi-implicit method decouples the wetting phase pressure and saturation equations. The equations are discretized using a hybridizable discontinuous Galerkin (HDG) method. The proposed method is of high order, conserves global/local mass balance, and the number of globally coupled degrees of freedom is significantly reduced compared to standard interior penalty discontinuous Galerkin methods. Several numerical examples illustrate the accuracy and robustness of the method. These examples include verification of convergence rates by manufactured solutions, common 1D benchmarks and realistic discontinuous permeability fields. △ Less

Submitted 16 February, 2018; originally announced February 2018.

Comments: 20 pages, 39 figures, 2 tables

arXiv:1705.03625 [pdf, other]

A performance spectrum for parallel computational frameworks that solve PDEs

Authors: J. Chang, K. B. Nakshatrala, M. G. Knepley, L. Johnsson

Abstract: Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to determine whether a computational framework is performing efficiently or is scalable. The aim of this paper is to present various strategies for better understandin… ▽ More Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to determine whether a computational framework is performing efficiently or is scalable. The aim of this paper is to present various strategies for better understanding the performance of any parallel computational frameworks for solving PDEs. Important performance issues that negatively impact time-to-solution are discussed, and we propose a performance spectrum analysis that can enhance one's understanding of critical aforementioned performance issues. As proof of concept, we examine commonly used finite element simulation packages and software and apply the performance spectrum to quickly analyze the performance and scalability across various hardware platforms, software implementations, and numerical discretizations. It is shown that the proposed performance spectrum is a versatile performance model that is not only extendable to more complex PDEs such as hydrostatic ice sheet flow equations, but also useful for understanding hardware performance in a massively parallel computing environment. Potential applications and future extensions of this work are also discussed. △ Less

Submitted 14 September, 2017; v1 submitted 10 May, 2017; originally announced May 2017.

arXiv:1702.08880 [pdf, other]

doi 10.1137/17M1118828

Landau Collision Integral Solver with Adaptive Mesh Refinement on Emerging Architectures

Authors: M. F. Adams, E. Hirvijoki, M. G. Knepley, J. Brown, T. Isaac, R. Mills

Abstract: The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emergin… ▽ More The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emerging architectures with an approach that minimizes memory usage and movement and is suitable for vector processing. The Landau collision integral is vectorized with Intel AVX-512 intrinsics and the solver sustains as much as 22% of the theoretical peak flop rate of the Second Generation Intel Xeon Phi, Knights Landing, processor. △ Less

Submitted 28 February, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Journal ref: SIAM Journal on Scientific Computing, 39 (6), 2017

arXiv:1610.09874 [pdf, other]

Anisotropic mesh adaptation in Firedrake with PETSc DMPlex

Authors: Nicolas Barral, Matthew G. Knepley, Michael Lange, Matthew D. Piggott, Gerard J. Gorman

Abstract: Despite decades of research in this area, mesh adaptation capabilities are still rarely found in numerical simulation software. We postulate that the primary reason for this is lack of usability. Integrating mesh adaptation into existing software is difficult as non-trivial operators, such as error metrics and interpolation operators, are required, and integrating available adaptive remeshers is n… ▽ More Despite decades of research in this area, mesh adaptation capabilities are still rarely found in numerical simulation software. We postulate that the primary reason for this is lack of usability. Integrating mesh adaptation into existing software is difficult as non-trivial operators, such as error metrics and interpolation operators, are required, and integrating available adaptive remeshers is not straightforward. Our approach presented here is to first integrate Pragmatic, an anisotropic mesh adaptation library, into DMPlex, a PETSc object that manages unstructured meshes and their interactions with PETSc's solvers and I/O routines. As PETSc is already widely used, this will make anisotropic mesh adaptation available to a much larger community. As a demonstration of this we describe the integration of anisotropic mesh adaptation into Firedrake, an automated Finite Element based system for the portable solution of partial differential equations which already uses PETSc solvers and I/O via DMPlex. We present a proof of concept of this integration with a three-dimensional advection test case. △ Less

Submitted 31 October, 2016; originally announced October 2016.

Comments: 5 page, 2 figures, Proceedings of the 25th International Meshing Roundtable, ed. Steve Owen and Hang Si, 2016

arXiv:1607.04254 [pdf, other]

doi 10.1137/130936725

Composing Scalable Nonlinear Algebraic Solvers

Authors: Peter R. Brune, Matthew G. Knepley, Barry F. Smith, Xuemin Tu

Abstract: Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear com… ▽ More Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods. △ Less

Submitted 14 July, 2016; originally announced July 2016.

Comments: 29 pages, 14 figures, 13 tables

MSC Class: 65F08; 65Y05; 65Y20; 68W10

Journal ref: SIAM Review 57(4), 535-565, 2015

arXiv:1607.04245 [pdf, other]

Finite Element Integration with Quadrature on the GPU

Authors: Matthew G. Knepley, Karl Rupp, Andy R. Terrel

Abstract: We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call \textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order d… ▽ More We present a novel, quadrature-based finite element integration method for low-order elements on GPUs, using a pattern we call \textit{thread transposition} to avoid reductions while vectorizing aggressively. On the NVIDIA GTX580, which has a nominal single precision peak flop rate of 1.5 TF/s and a memory bandwidth of 192 GB/s, we achieve close to 300 GF/s for element integration on first-order discretization of the Laplacian operator with variable coefficients in two dimensions, and over 400 GF/s in three dimensions. From our performance model we find that this corresponds to 90\% of our measured achievable bandwidth peak of 310 GF/s. Further experimental results also match the predicted performance when used with double precision (120 GF/s in two dimensions, 150 GF/s in three dimensions). Results obtained for the linear elasticity equations (220 GF/s and 70 GF/s in two dimensions, 180 GF/s and 60 GF/s in three dimensions) also demonstrate the applicability of our method to vector-valued partial differential equations. △ Less

Submitted 14 July, 2016; originally announced July 2016.

Comments: 14 pages, 6 figures

ACM Class: G.4; G.1.8

arXiv:1604.07163 [pdf, other]

Extreme-scale Multigrid Components within PETSc

Authors: Dave A. May, Patrick Sanan, Karl Rupp, Matthew G. Knepley, Barry F. Smith

Abstract: Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for high-resolution simulations. The scalability and time-to-solution of massively parallel multilevel precon… ▽ More Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for high-resolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely effected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary. In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation. △ Less

Submitted 25 April, 2016; originally announced April 2016.

arXiv:1602.04873 [pdf, other]

A Stochastic Performance Model for Pipelined Krylov Methods

Authors: Hannah Morgan, Matthew G. Knepley, Patrick Sanan, L. Ridgway Scott

Abstract: Pipelined Krylov methods seek to ameliorate the latency due to inner products necessary for projection by overlapping it with the computation associated with sparse matrix-vector multiplication. We clarify a folk theorem that this can only result in a speedup of $2\times$ over the naive implementation. Examining many repeated runs, we show that stochastic noise also contributes to the latency, and… ▽ More Pipelined Krylov methods seek to ameliorate the latency due to inner products necessary for projection by overlapping it with the computation associated with sparse matrix-vector multiplication. We clarify a folk theorem that this can only result in a speedup of $2\times$ over the naive implementation. Examining many repeated runs, we show that stochastic noise also contributes to the latency, and we model this using an analytical probability distribution. Our analysis shows that speedups greater than $2\times$ are possible with these algorithms. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1508.02470 [pdf, other]

Support for Non-conformal Meshes in PETSc's DMPlex Interface

Authors: Tobin Isaac, Matthew G. Knepley

Abstract: PETSc's DMPlex interface for unstructured meshes has been extended to support non-conformal meshes. The topological construct that DMPlex implements---the CW-complex---is by definition conformal, so representing non- conformal meshes in a way that hides complexity requires careful attention to the interface between DMPlex and numerical methods such as the finite element method. Our approach---whic… ▽ More PETSc's DMPlex interface for unstructured meshes has been extended to support non-conformal meshes. The topological construct that DMPlex implements---the CW-complex---is by definition conformal, so representing non- conformal meshes in a way that hides complexity requires careful attention to the interface between DMPlex and numerical methods such as the finite element method. Our approach---which combines a tree structure for subset- superset relationships and a "reference tree" describing the types of non-conformal interfaces---allows finite element code written for conformal meshes to extend automatically: in particular, all "hanging-node" constraint calculations are handled behind the scenes. We give example code demonstrating the use of this extension, and use it to convert forests of quadtrees and forests of octrees from the p4est library to DMPlex meshes. △ Less

Submitted 10 August, 2015; originally announced August 2015.

Comments: 16 pages, 13 figures, 5 code examples

arXiv:1506.07749 [pdf, other]

doi 10.1137/15M1026092

Efficient mesh management in Firedrake using PETSc-DMPlex

Authors: Michael Lange, Lawrence Mitchell, Matthew G. Knepley, Gerard J. Gorman

Abstract: The use of composable abstractions allows the application of new and established algorithms to a wide range of problems while automatically inheriting the benefits of well-known performance optimisations. This work highlights the composition of the PETSc DMPlex domain topology abstraction with the Firedrake automated finite element system to create a PDE solving environment that combines expressiv… ▽ More The use of composable abstractions allows the application of new and established algorithms to a wide range of problems while automatically inheriting the benefits of well-known performance optimisations. This work highlights the composition of the PETSc DMPlex domain topology abstraction with the Firedrake automated finite element system to create a PDE solving environment that combines expressiveness, flexibility and high performance. We describe how Firedrake utilises DMPlex to provide the indirection maps required for finite element assembly, while supporting various mesh input formats and runtime domain decomposition. In particular, we describe how DMPlex and its accompanying data structures allow the generic creation of user-defined discretisations, while utilising data layout optimisations that improve cache coherency and ensure overlapped communication during assembly computation. △ Less

Submitted 25 June, 2015; originally announced June 2015.

Comments: 12 pages, 6 figures, submitted to SISC CSE Special Issue

Journal ref: SIAM Journal on Scientific Computing 38(5):S143-S155 (2016)

arXiv:1506.06194 [pdf, other]

Unstructured Overlapping Mesh Distribution in Parallel

Authors: Matthew G. Knepley, Michael Lange, Gerard J. Gorman

Abstract: We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coordinates. The high level of abstraction makes our interface both concise and powerful, as the same algorithm applies to any representable mesh, such as… ▽ More We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coordinates. The high level of abstraction makes our interface both concise and powerful, as the same algorithm applies to any representable mesh, such as hybrid meshes, meshes embedded in higher dimension, and overlapped meshes in parallel. We present evidence, both theoretical and experimental, that the algorithms are scalable and efficient. A working implementation can be found in the latest release of the PETSc libraries. △ Less

Submitted 19 June, 2015; originally announced June 2015.

Comments: 14 pages, 6 figures, submitted to TOMS

arXiv:1505.04633 [pdf, other]

Flexible, Scalable Mesh and Data Management using PETSc DMPlex

Authors: Michael Lange, Matthew G. Knepley, Gerard J. Gorman

Abstract: Designing a scientific software stack to meet the needs of the next-generation of mesh-based simulation demands, not only scalable and efficient mesh and data management on a wide range of platforms, but also an abstraction layer that makes it useful for a wide range of application codes. Common utility tasks, such as file I/O, mesh distribution, and work partitioning, should be delegated to exter… ▽ More Designing a scientific software stack to meet the needs of the next-generation of mesh-based simulation demands, not only scalable and efficient mesh and data management on a wide range of platforms, but also an abstraction layer that makes it useful for a wide range of application codes. Common utility tasks, such as file I/O, mesh distribution, and work partitioning, should be delegated to external libraries in order to promote code re-use, extensibility and software interoperability. In this paper we demonstrate the use of PETSc's DMPlex data management API to perform mesh input and domain partitioning in Fluidity, a large scale CFD application. We demonstrate that raising the level of abstraction adds new functionality to the application code, such as support for additional mesh file formats and mesh re- ordering, while improving simulation startup cost through more efficient mesh distribution. Moreover, the separation of concerns accomplished through this interface shifts critical performance and interoperability issues, such as scalable I/O and file format support, to a widely used and supported open source community library, improving the sustainability, performance, and functionality of Fluidity. △ Less

Submitted 18 May, 2015; originally announced May 2015.

Comments: 6 pages, 6 figures, to appear in EASC 2015

arXiv:1409.7418 [pdf, other]

doi 10.1063/1.4897324

Modeling Charge-Sign Asymmetric Solvation Free Energies With Nonlinear Boundary Conditions

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: We show that charge-sign-dependent asymmetric hydration can be modeled accurately using linear Poisson theory but replacing the standard electric-displacement boundary condition with a simple nonlinear boundary condition. Using a single multiplicative scaling factor to determine atomic radii from molecular dynamics Lennard-Jones parameters, the new model accurately reproduces MD free-energy calcul… ▽ More We show that charge-sign-dependent asymmetric hydration can be modeled accurately using linear Poisson theory but replacing the standard electric-displacement boundary condition with a simple nonlinear boundary condition. Using a single multiplicative scaling factor to determine atomic radii from molecular dynamics Lennard-Jones parameters, the new model accurately reproduces MD free-energy calculations of hydration asymmetries for (i) monatomic ions, (ii) titratable amino acids in both their protonated and unprotonated states, and (iii) the Mobley "bracelet" and "rod" test problems [J. Phys. Chem. B, v. 112:2408, 2008]. Remarkably, the model also justifies the use of linear response expressions for charging free energies. Our boundary-element method implementation demonstrates the ease with which other continuum-electrostatic solvers can be extended to include asymmetry. △ Less

Submitted 25 September, 2014; originally announced September 2014.

Comments: 7 pages, 2 figures, accepted to Journal of Chemical Physics

arXiv:1407.2905 [pdf, ps, other]

Run-time extensibility and librarization of simulation software

Authors: Jed Brown, Matthew G. Knepley, Barry F. Smith

Abstract: Build-time configuration and environment assumptions are hampering progress and usability in scientific software. That which would be utterly unacceptable in non-scientific software somehow passes for the norm in scientific packages. The community needs reusable software packages that are easy use and flexible enough to accommodate next-generation simulation and analysis demands. Build-time configuration and environment assumptions are hampering progress and usability in scientific software. That which would be utterly unacceptable in non-scientific software somehow passes for the norm in scientific packages. The community needs reusable software packages that are easy use and flexible enough to accommodate next-generation simulation and analysis demands. △ Less

Submitted 10 July, 2014; originally announced July 2014.

Comments: 6 pages

arXiv:1309.1204 [pdf, other]

Achieving High Performance with Unified Residual Evaluation

Authors: Matthew G. Knepley, Jed Brown, Karl Rupp, Barry F. Smith

Abstract: We examine residual evaluation, perhaps the most basic operation in numerical simulation. By raising the level of abstraction in this operation, we can eliminate specialized code, enable optimization, and greatly increase the extensibility of existing code. We examine residual evaluation, perhaps the most basic operation in numerical simulation. By raising the level of abstraction in this operation, we can eliminate specialized code, enable optimization, and greatly increase the extensibility of existing code. △ Less

Submitted 6 September, 2013; v1 submitted 4 September, 2013; originally announced September 2013.

Comments: 4 pages, 1 figure

arXiv:1308.5846 [pdf, other]

doi 10.1002/jgrb.50217

A Domain Decomposition Approach to Implementing Fault Slip in Finite-Element Models of Quasi-static and Dynamic Crustal Deformation

Authors: Brad T. Aagaard, Matthew G. Knepley, Charles A. Williams

Abstract: We employ a domain decomposition approach with Lagrange multipliers to implement fault slip in a finite-element code, PyLith, for use in both quasi-static and dynamic crustal deformation applications. This integrated approach to solving both quasi-static and dynamic simulations leverages common finite-element data structures and implementations of various boundary conditions, discretization scheme… ▽ More We employ a domain decomposition approach with Lagrange multipliers to implement fault slip in a finite-element code, PyLith, for use in both quasi-static and dynamic crustal deformation applications. This integrated approach to solving both quasi-static and dynamic simulations leverages common finite-element data structures and implementations of various boundary conditions, discretization schemes, and bulk and fault rheologies. We have developed a custom preconditioner for the Lagrange multiplier portion of the system of equations that provides excellent scalability with problem size compared to conventional additive Schwarz methods. We demonstrate application of this approach using benchmarks for both quasi-static viscoelastic deformation and dynamic spontaneous rupture propagation that verify the numerical implementation in PyLith. △ Less

Submitted 27 August, 2013; originally announced August 2013.

Comments: 14 pages, 15 figures

Journal ref: Journal of Geophysical Research, 118(6), pp.3059-3079, 2013

arXiv:1209.1711 [pdf, ps, other]

doi 10.1007/978-3-540-70529-1

Programming Languages for Scientific Computing

Authors: Matthew G. Knepley

Abstract: Scientific computation is a discipline that combines numerical analysis, physical understanding, algorithm development, and structured programming. Several yottacycles per year on the world's largest computers are spent simulating problems as diverse as weather prediction, the properties of material composites, the behavior of biomolecules in solution, and the quantum nature of chemical compounds.… ▽ More Scientific computation is a discipline that combines numerical analysis, physical understanding, algorithm development, and structured programming. Several yottacycles per year on the world's largest computers are spent simulating problems as diverse as weather prediction, the properties of material composites, the behavior of biomolecules in solution, and the quantum nature of chemical compounds. This article is intended to review specfic languages features and their use in computational science. We will review the strengths and weaknesses of different programming styles, with examples taken from widely used scientific codes. △ Less

Submitted 9 January, 2018; v1 submitted 8 September, 2012; originally announced September 2012.

Comments: 21 pages

Journal ref: Encyclopedia of Applied and Computational Mathematics, Springer, 2012

arXiv:1208.3866 [pdf, ps, other]

Analytical Nonlocal Electrostatics Using Eigenfunction Expansions of Boundary-Integral Operators

Authors: Jaydeep P. Bardhan, Matthew G. Knepley, Peter R. Brune

Abstract: In this paper, we present an analytical solution to nonlocal continuum electrostatics for an arbitrary charge distribution in a spherical solute. Our approach relies on two key steps: (1) re-formulating the PDE problem using boundary-integral equations, and (2) diagonalizing the boundary-integral operators using the fact their eigenfunctions are the surface spherical harmonics. To introduce this u… ▽ More In this paper, we present an analytical solution to nonlocal continuum electrostatics for an arbitrary charge distribution in a spherical solute. Our approach relies on two key steps: (1) re-formulating the PDE problem using boundary-integral equations, and (2) diagonalizing the boundary-integral operators using the fact their eigenfunctions are the surface spherical harmonics. To introduce this uncommon approach for analytical calculations in separable geometries, we rederive Kirkwood's classic results for a protein surrounded concentrically by a pure-water ion-exclusion layer and then a dilute electrolyte (modeled with the linearized Poisson--Boltzmann equation). Our main result, however, is an analytical method for calculating the reaction potential in a protein embedded in a nonlocal-dielectric solvent, the Lorentz model studied by Dogonadze and Kornyshev. The analytical method enables biophysicists to study the new nonlocal theory in a simple, computationally fast way; an open-source MATLAB implementation is included as supplemental information. △ Less

Submitted 20 August, 2012; v1 submitted 19 August, 2012; originally announced August 2012.

Comments: 19 pages, 7 figures

arXiv:1204.0267 [pdf, ps, other]

doi 10.1088/1749-4699/5/1/014006

Computational science and re-discovery: open-source implementations of ellipsoidal harmonics for problems in potential theory

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: We present two open-source (BSD) implementations of ellipsoidal harmonic expansions for solving problems of potential theory using separation of variables. Ellipsoidal harmonics are used surprisingly infrequently, considering their substantial value for problems ranging in scale from molecules to the entire solar system. In this article, we suggest two possible reasons for the paucity relative to… ▽ More We present two open-source (BSD) implementations of ellipsoidal harmonic expansions for solving problems of potential theory using separation of variables. Ellipsoidal harmonics are used surprisingly infrequently, considering their substantial value for problems ranging in scale from molecules to the entire solar system. In this article, we suggest two possible reasons for the paucity relative to spherical harmonics. The first is essentially historical---ellipsoidal harmonics developed during the late 19th century and early 20th, when it was found that only the lowest-order harmonics are expressible in closed form. Each higher-order term requires the solution of an eigenvalue problem, and tedious manual computation seems to have discouraged applications and theoretical studies. The second explanation is practical: even with modern computers and accurate eigenvalue algorithms, expansions in ellipsoidal harmonics are significantly more challenging to compute than those in Cartesian or spherical coordinates. The present implementations reduce the "barrier to entry" by providing an easy and free way for the community to begin using ellipsoidal harmonics in actual research. We demonstrate our implementation using the specific and physiologically crucial problem of how charged proteins interact with their environment, and ask: what other analytical tools await re-discovery in an era of inexpensive computation? △ Less

Submitted 3 April, 2012; v1 submitted 1 April, 2012; originally announced April 2012.

Comments: 25 pages, 3 figures

Journal ref: Computational Science & Discovery, 5:014006, 2012

arXiv:1111.6583 [pdf, other]

doi 10.1137/110856976

PyClaw: Accessible, Extensible, Scalable Tools for Wave Propagation Problems

Authors: David I. Ketcheson, Kyle T. Mandli, Aron Ahmadia, Amal Alghamdi, Manuel Quezada, Matteo Parsani, Matthew G. Knepley, Matthew Emmett

Abstract: Development of scientific software involves tradeoffs between ease of use, generality, and performance. We describe the design of a general hyperbolic PDE solver that can be operated with the convenience of MATLAB yet achieves efficiency near that of hand-coded Fortran and scales to the largest supercomputers. This is achieved by using Python for most of the code while employing automatically-wrap… ▽ More Development of scientific software involves tradeoffs between ease of use, generality, and performance. We describe the design of a general hyperbolic PDE solver that can be operated with the convenience of MATLAB yet achieves efficiency near that of hand-coded Fortran and scales to the largest supercomputers. This is achieved by using Python for most of the code while employing automatically-wrapped Fortran kernels for computationally intensive routines, and using Python bindings to interface with a parallel computing library and other numerical packages. The software described here is PyClaw, a Python-based structured grid solver for general systems of hyperbolic PDEs \cite{pyclaw}. PyClaw provides a powerful and intuitive interface to the algorithms of the existing Fortran codes Clawpack and SharpClaw, simplifying code development and use while providing massive parallelism and scalable solvers via the PETSc library. The package is further augmented by use of PyWENO for generation of efficient high-order weighted essentially non-oscillatory reconstruction code. The simplicity, capability, and performance of this approach are demonstrated through application to example problems in shallow water flow, compressible flow and elasticity. △ Less

Submitted 12 May, 2012; v1 submitted 27 November, 2011; originally announced November 2011.

Journal ref: SISC 34(4):C210-C231 (2012)

arXiv:1109.0651 [pdf, ps, other]

doi 10.1063/1.3641485

Mathematical Analysis of the BIBEE Approximation for Molecular Solvation: Exact Results for Spherical Inclusions

Authors: Jaydeep P. Bardhan, Matthew G. Knepley

Abstract: We analyze the mathematically rigorous BIBEE (boundary-integral based electrostatics estimation) approximation of the mixed-dielectric continuum model of molecular electrostatics, using the analytically solvable case of a spherical solute containing an arbitrary charge distribution. Our analysis, which builds on Kirkwood's solution using spherical harmonics, clarifies important aspects of the appr… ▽ More We analyze the mathematically rigorous BIBEE (boundary-integral based electrostatics estimation) approximation of the mixed-dielectric continuum model of molecular electrostatics, using the analytically solvable case of a spherical solute containing an arbitrary charge distribution. Our analysis, which builds on Kirkwood's solution using spherical harmonics, clarifies important aspects of the approximation and its relationship to Generalized Born models. First, our results suggest a new perspective for analyzing fast electrostatic models: the separation of variables between material properties (the dielectric constants) and geometry (the solute dielectric boundary and charge distribution). Second, we find that the eigenfunctions of the reaction-potential operator are exactly preserved in the BIBEE model for the sphere, which supports the use of this approximation for analyzing charge-charge interactions in molecular binding. Third, a comparison of BIBEE to the recent GB$ε$ theory suggests a modified BIBEE model capable of predicting electrostatic solvation free energies to within 4% of a full numerical Poisson calculation. This modified model leads to a projection-framework understanding of BIBEE and suggests opportunities for future improvements. △ Less

Submitted 3 September, 2011; originally announced September 2011.

Comments: 33 pages, 5 figures

Journal ref: Journal of Chemical Physics, 135(12):124107-124117, 2011

arXiv:1107.5951 [pdf, other]

doi 10.1111/j.1365-246X.2011.05167.x

Optimal, scalable forward models for computing gravity anomalies

Authors: Dave A. May, Matthew G. Knepley

Abstract: We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM)… ▽ More We describe three approaches for computing a gravity signal from a density anomaly. The first approach consists of the classical "summation" technique, whilst the remaining two methods solve the Poisson problem for the gravitational potential using either a Finite Element (FE) discretization employing a multilevel preconditioner, or a Green's function evaluated with the Fast Multipole Method (FMM). The methods utilizing the PDE formulation described here differ from previously published approaches used in gravity modeling in that they are optimal, implying that both the memory and computational time required scale linearly with respect to the number of unknowns in the potential field. Additionally, all of the implementations presented here are developed such that the computations can be performed in a massively parallel, distributed memory computing environment. Through numerical experiments, we compare the methods on the basis of their discretization error, CPU time and parallel scalability. We demonstrate the parallel scalability of all these techniques by running forward models with up to $10^8$ voxels on 1000's of cores. △ Less

Submitted 29 July, 2011; originally announced July 2011.

Comments: 38 pages, 13 figures; accepted by Geophysical Journal International

Journal ref: Geophysical Journal International, 187(1):161-177, 2011

arXiv:1104.0261 [pdf, other]

Unstructured Geometric Multigrid in Two and Three Dimensions on Complex and Graded Meshes

Authors: Peter R. Brune, Matthew G. Knepley, L. Ridgway Scott

Abstract: The use of multigrid and related preconditioners with the finite element method is often limited by the difficulty of applying the algorithm effectively to a problem, especially when the domain has a complex shape or adaptive refinement. We introduce a simplification of a general topologically-motivated mesh coarsening algorithm for use in creating hierarchies of meshes for geometric unstructured… ▽ More The use of multigrid and related preconditioners with the finite element method is often limited by the difficulty of applying the algorithm effectively to a problem, especially when the domain has a complex shape or adaptive refinement. We introduce a simplification of a general topologically-motivated mesh coarsening algorithm for use in creating hierarchies of meshes for geometric unstructured multigrid methods. The connections between the guarantees of this technique and the quality criteria necessary for multigrid methods for non-quasi-uniform problems are noted. The implementation details, in particular those related to coarsening, remeshing, and interpolation, are discussed. Computational tests on pathological test cases from adaptive finite element methods show the performance of the technique. △ Less

Submitted 5 April, 2011; v1 submitted 1 April, 2011; originally announced April 2011.

Comments: 17 pages, 5 figures, 4 tables

MSC Class: 65N30; 65M50; 65M55

Journal ref: SIAM Journal on Scientific Computing, 35(1), A173-A191, 2013

arXiv:1103.0066 [pdf, other]

doi 10.1145/2427023.2427027

Finite Element Integration on GPUs

Authors: Matthew G. Knepley, Andy R. Terrel

Abstract: We present a novel finite element integration method for low order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators. We present a novel finite element integration method for low order elements on GPUs. We achieve more than 100GF for element integration on first order discretizations of both the Laplacian and Elasticity operators. △ Less

Submitted 28 February, 2011; originally announced March 2011.

Comments: 16 pages, 3 figures

ACM Class: G.4; G.1.8

Journal ref: ACM Transactions on Mathematical Software, 39(2), 2013

arXiv:1008.2410 [pdf, other]

Removing the Barrier to Scalability in Parallel FMM

Authors: Matthew G. Knepley

Abstract: The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing workload on higher levels of the FMM tree [Greengard and Gropp, Comp. Math. Appl., 20(7), 1990]. We show that this potential bottleneck can be eliminated by overlapping multipole and local expansion computations with direct kernel evaluations on the finest level grid. The Fast Multipole Method (FMM) is well known to possess a bottleneck arising from decreasing workload on higher levels of the FMM tree [Greengard and Gropp, Comp. Math. Appl., 20(7), 1990]. We show that this potential bottleneck can be eliminated by overlapping multipole and local expansion computations with direct kernel evaluations on the finest level grid. △ Less

Submitted 13 August, 2010; originally announced August 2010.

Comments: 11 pages, 2 figures

arXiv:1007.4591 [pdf, other]

doi 10.1016/j.cpc.2011.02.013

Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns

Authors: Rio Yokota, Jaydeep P. Bardhan, Matthew G. Knepley, L. A. Barba, Tsuyoshi Hamada

Abstract: We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved… ▽ More We present teraflop-scale calculations of biomolecular electrostatics enabled by the combination of algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model, as well as the BIBEE approximation to BEM. The hardware acceleration is achieved through graphics processors, GPUs. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. The applications demonstrated include the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of lysozyme molecules. The parallel scalability of the software was studied in a cluster at the Nagasaki Advanced Computing Center, using 128 nodes, each with 4 GPUs. Delicate tuning has resulted in strong scaling with parallel efficiency of 0.8 for 256 and 0.5 for 512 GPUs. The largest application run, with over 20 million atoms and one billion unknowns, required only one minute on 512 GPUs. We are currently adapting our BEM software to solve the linearized Poisson-Boltzmann equation for dilute ionic solutions, and it is also designed to be flexible enough to be extended for a variety of integral equation problems, ranging from Poisson problems to Helmholtz problems in electromagnetics and acoustics to high Reynolds number flow. △ Less

Submitted 10 February, 2011; v1 submitted 26 July, 2010; originally announced July 2010.

Journal ref: Comput. Phys. Commun., 182(6):1271-1283 (2011)

arXiv:0909.5413 [pdf, ps, other]

doi 10.1016/j.cma.2010.02.008

PetRBF--A parallel O(N) algorithm for radial basis function interpolation

Authors: Rio Yokota, L. A. Barba, Matthew G. Knepley

Abstract: We have developed a parallel algorithm for radial basis function (RBF) interpolation that exhibits O(N) complexity,requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a GMRES iterative solver with a restricted additive Schwarz method (RASM) as a preconditioner and a fast matrix-vector algorithm. Previous fast RBF methods, --,achieving at most O(NlogN) com… ▽ More We have developed a parallel algorithm for radial basis function (RBF) interpolation that exhibits O(N) complexity,requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a GMRES iterative solver with a restricted additive Schwarz method (RASM) as a preconditioner and a fast matrix-vector algorithm. Previous fast RBF methods, --,achieving at most O(NlogN) complexity,--, were developed using multiquadric and polyharmonic basis functions. In contrast, the present method uses Gaussians with a small variance (a common choice in particle methods for fluid simulation, our main target application). The fast decay of the Gaussian basis function allows rapid convergence of the iterative solver even when the subdomains in the RASM are very small. The present method was implemented in parallel using the PETSc library (developer version). Numerical experiments demonstrate its capability in problems of RBF interpolation with more than 50 million data points, timing at 106 seconds (19 iterations for an error tolerance of 10^-15 on 1024 processors of a Blue Gene/L (700 MHz PowerPC processors). The parallel code is freely available in the open-source model. △ Less

Submitted 29 September, 2009; originally announced September 2009.

Comments: Submitted to Computer Methods in Applied Mechanics and Engineering

Journal ref: Computer Methods in Applied Mechanics and Engineering, 199(25-28), pp. 1793-1804, 2010

arXiv:0908.4427 [pdf, other]

doi 10.3233/SPR-2009-0249

Mesh Algorithms for PDE with Sieve I: Mesh Distribution

Authors: Matthew G. Knepley, Dmitry A. Karpeev

Abstract: We have developed a new programming framework, called Sieve, to support parallel numerical PDE algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic algorithms operating on distributed containers conforming to the Sieve interface. Sieve makes instances of the incidence relation, or \emph{arrows}, the conceptual fir… ▽ More We have developed a new programming framework, called Sieve, to support parallel numerical PDE algorithms operating over distributed meshes. We have also developed a reference implementation of Sieve in C++ as a library of generic algorithms operating on distributed containers conforming to the Sieve interface. Sieve makes instances of the incidence relation, or \emph{arrows}, the conceptual first-class objects represented in the containers. Further, generic algorithms acting on this arrow container are systematically used to provide natural geometric operations on the topology and also, through duality, on the data. Finally, coverings and duality are used to encode not only individual meshes, but all types of hierarchies underlying PDE data structures, including multigrid and mesh partitions. In order to demonstrate the usefulness of the framework, we show how the mesh partition data can be represented and manipulated using the same fundamental mechanisms used to represent meshes. We present the complete description of an algorithm to encode a mesh partition and then distribute a mesh, which is independent of the mesh dimension, element shape, or embedding. Moreover, data associated with the mesh can be similarly distributed with exactly the same algorithm. The use of a high level of abstraction within the Sieve leads to several benefits in terms of code reuse, simplicity, and extensibility. We discuss these benefits and compare our approach to other existing mesh libraries. △ Less

Submitted 30 August, 2009; originally announced August 2009.

Comments: 36 pages, 22 figures

ACM Class: G.1.8; G.4; J.2; E.2

Journal ref: Scientific Programming, 17(3), 215-230, 2009

arXiv:0905.2637 [pdf, other]

doi 10.1002/nme.2972

PetFMM--A dynamically load-balancing parallel fast multipole library

Authors: Felipe A. Cruz, Matthew G. Knepley, L. A. Barba

Abstract: Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM), which offers O(N) complexity. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm, built on the framework o… ▽ More Fast algorithms for the computation of $N$-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this last class belongs the well-known fast multipole method (FMM), which offers O(N) complexity. This paper presents an extensible parallel library for $N$-body interactions utilizing the FMM algorithm, built on the framework of PETSc. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based $N$-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by $N$ vortex particles, ample verification and testing of the library was performed. Strong scaling results are presented with close to a million particles in up to 64 processors, including both speedup and parallel efficiency. The library is currently able to achieve over 85% parallel efficiency for 64 processors. The software library is open source under the PETSc license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications. △ Less

Submitted 15 May, 2009; originally announced May 2009.

Comments: 28 pages, 9 figures

Journal ref: Int. J. Num. Meth. Eng., 85(4): 403-428 (Jan. 2011)

Showing 1–40 of 40 results for author: Knepley, M G