Search | arXiv e-print repository

Trilinos: Enabling Scientific Computing Across Diverse Hardware Architectures at Scale

Authors: Matthias Mayr, Alexander Heinlein, Christian Glusa, Siva Rajamanickam, Maarten Arnst, Roscoe Bartlett, Luc Berger-Vergiat, Erik Boman, Karen Devine, Graham Harper, Michael Heroux, Mark Hoemmen, Jonathan Hu, Brian Kelley, Kyungjoo Kim, Drew P. Kouri, Paul Kuberry, Kim Liegeois, Curtis C. Ober, Roger Pawlowski, Carl Pearson, Mauro Perego, Eric Phipps, Denis Ridzal, Nathan V. Roberts , et al. (8 additional authors not shown)

Abstract: Trilinos is a community-developed, open-source software framework that facilitates building large-scale, complex, multiscale, multiphysics simulation code bases for scientific and engineering problems. Since the Trilinos framework has undergone substantial changes to support new applications and new hardware architectures, this document is an update to ``An Overview of the Trilinos project'' by He… ▽ More Trilinos is a community-developed, open-source software framework that facilitates building large-scale, complex, multiscale, multiphysics simulation code bases for scientific and engineering problems. Since the Trilinos framework has undergone substantial changes to support new applications and new hardware architectures, this document is an update to ``An Overview of the Trilinos project'' by Heroux et al. (ACM Transactions on Mathematical Software, 31(3):397-423, 2005). It describes the design of Trilinos, introduces its new organization in product areas, and highlights established and new features available in Trilinos. Particular focus is put on the modernized software stack based on the Kokkos ecosystem to deliver performance portability across heterogeneous hardware architectures. This paper also outlines the organization of the Trilinos community and the contribution model to help onboard interested users and contributors. △ Less

Submitted 11 March, 2025; originally announced March 2025.

Comments: 32 pages, 1 figure

Report number: SAND2025-02891O MSC Class: 65-04; 65Y05 ACM Class: G.4; G.1.3

arXiv:2301.11402 [pdf, other]

A Hybrid Deep Neural Operator/Finite Element Method for Ice-Sheet Modeling

Authors: QiZhi He, Mauro Perego, Amanda A. Howard, George Em Karniadakis, Panos Stinis

Abstract: One of the most challenging and consequential problems in climate modeling is to provide probabilistic projections of sea level rise. A large part of the uncertainty of sea level projections is due to uncertainty in ice sheet dynamics. At the moment, accurate quantification of the uncertainty is hindered by the cost of ice sheet computational models. In this work, we develop a hybrid approach to a… ▽ More One of the most challenging and consequential problems in climate modeling is to provide probabilistic projections of sea level rise. A large part of the uncertainty of sea level projections is due to uncertainty in ice sheet dynamics. At the moment, accurate quantification of the uncertainty is hindered by the cost of ice sheet computational models. In this work, we develop a hybrid approach to approximate existing ice sheet computational models at a fraction of their cost. Our approach consists of replacing the finite element model for the momentum equations for the ice velocity, the most expensive part of an ice sheet model, with a Deep Operator Network, while retaining a classic finite element discretization for the evolution of the ice thickness. We show that the resulting hybrid model is very accurate and it is an order of magnitude faster than the traditional finite element model. Further, a distinctive feature of the proposed model compared to other neural network approaches, is that it can handle high-dimensional parameter spaces (parameter fields) such as the basal friction at the bed of the glacier, and can therefore be used for generating samples for uncertainty quantification. We study the impact of hyper-parameters, number of unknowns and correlation length of the parameter distribution on the training and accuracy of the Deep Operator Network on a synthetic ice sheet model. We then target the evolution of the Humboldt glacier in Greenland and show that our hybrid model can provide accurate statistics of the glacier mass loss and can be effectively used to accelerate the quantification of uncertainty. △ Less

Submitted 26 January, 2023; originally announced January 2023.

arXiv:2204.09157 [pdf, other]

doi 10.1016/j.jcp.2023.112462

Multifidelity Deep Operator Networks For Data-Driven and Physics-Informed Problems

Authors: Amanda A. Howard, Mauro Perego, George E. Karniadakis, Panos Stinis

Abstract: Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fid… ▽ More Operator learning for complex nonlinear systems is increasingly common in modeling multi-physics and multi-scale systems. However, training such high-dimensional operators requires a large amount of expensive, high-fidelity data, either from experiments or simulations. In this work, we present a composite Deep Operator Network (DeepONet) for learning using two datasets with different levels of fidelity to accurately learn complex operators when sufficient high-fidelity data is not available. Additionally, we demonstrate that the presence of low-fidelity data can improve the predictions of physics-informed learning with DeepONets. We demonstrate the new multi-fidelity training in diverse examples, including modeling of the ice-sheet dynamics of the Humboldt glacier, Greenland, using two different fidelity models and also using the same physical model at two different resolutions. △ Less

Submitted 21 November, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

arXiv:2204.04321 [pdf, other]

Performance portable ice-sheet modeling with MALI

Authors: Jerry Watkins, Max Carlson, Kyle Shan, Irina Tezaur, Mauro Perego, Luca Bertagna, Carolyn Kao, Matthew J. Hoffman, Stephen F. Price

Abstract: High resolution simulations of polar ice-sheets play a crucial role in the ongoing effort to develop more accurate and reliable Earth-system models for probabilistic sea-level projections. These simulations often require a massive amount of memory and computation from large supercomputing clusters to provide sufficient accuracy and resolution. The latest exascale machines poised to come online con… ▽ More High resolution simulations of polar ice-sheets play a crucial role in the ongoing effort to develop more accurate and reliable Earth-system models for probabilistic sea-level projections. These simulations often require a massive amount of memory and computation from large supercomputing clusters to provide sufficient accuracy and resolution. The latest exascale machines poised to come online contain a diverse set of computing architectures. In an effort to avoid architecture specific programming and maintain productivity across platforms, the ice-sheet modeling code known as MALI uses high level abstractions to integrate Trilinos libraries and the Kokkos programming model for performance portable code across a variety of different architectures. In this paper, we analyze the performance portable features of MALI via a performance analysis on current CPU-based and GPU-based supercomputers. The analysis highlights performance portable improvements made in finite element assembly and multigrid preconditioning within MALI with speedups between 1.26-1.82x across CPU and GPU architectures but also identifies the need to further improve performance in software coupling and preconditioning on GPUs. We also perform a weak scalability study and show that simulations on GPU-based machines perform 1.24-1.92x faster when utilizing the GPUs. The best performance is found in finite element assembly which achieved a speedup of up to 8.65x and a weak scaling efficiency of 82.9% with GPUs. We additionally describe an automated performance testing framework developed for this code base using a changepoint detection method. The framework is used to make actionable decisions about performance within MALI. We provide several concrete examples of scenarios in which the framework has identified performance regressions, improvements, and algorithm differences over the course of two years of development. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Report number: SAND2022-4228 O

arXiv:1912.04862 [pdf, other]

Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint

Authors: Eric C. Cyr, Mamikon A. Gulian, Ravi G. Patel, Mauro Perego, Nathaniel A. Trask

Abstract: Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dram… ▽ More Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 26 pages

arXiv:1802.10023 [pdf, other]

doi 10.1364/OPTICA.5.000263

Dual polarization nonlinear Fourier transform-based optical communication system

Authors: Simone Gaiarin, Auro Michele Perego, Edson Porto da Silva, Francesco Da Ros, Darko Zibar

Abstract: New services and applications are causing an exponential increase in internet traffic. In a few years, current fiber optic communication system infrastructure will not be able to meet this demand because fiber nonlinearity dramatically limits the information transmission rate. Eigenvalue communication could potentially overcome these limitations. It relies on a mathematical technique called "nonli… ▽ More New services and applications are causing an exponential increase in internet traffic. In a few years, current fiber optic communication system infrastructure will not be able to meet this demand because fiber nonlinearity dramatically limits the information transmission rate. Eigenvalue communication could potentially overcome these limitations. It relies on a mathematical technique called "nonlinear Fourier transform (NFT)" to exploit the "hidden" linearity of the nonlinear Schrödinger equation as the master model for signal propagation in an optical fiber. We present here the theoretical tools describing the NFT for the Manakov system and report on experimental transmission results for dual polarization in fiber optic eigenvalue communications. A transmission of up to 373.5 km with bit error rate less than the hard-decision forward error correction threshold has been achieved. Our results demonstrate that dual-polarization NFT can work in practice and enable an increased spectral efficiency in NFT-based communication systems, which are currently based on single polarization channels. △ Less

Submitted 5 March, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

arXiv:1708.00350 [pdf, other]

Experimental Demonstration of Dual Polarization Nonlinear Frequency Division Multiplexed Optical Transmission System

Authors: Simone Gaiarin, Auro Michele Perego, Edson Porto da Silva, Francesco Da Ros, Darko Zibar

Abstract: Multi-eigenvalues transmission with information encoded simultaneously in both orthogonal polarizations is experimentally demonstrated. Performance below the HD-FEC limit is demonstrated for 8-bits/symbol 1-GBd signals after transmission up to 207 km of SSMF. Multi-eigenvalues transmission with information encoded simultaneously in both orthogonal polarizations is experimentally demonstrated. Performance below the HD-FEC limit is demonstrated for 8-bits/symbol 1-GBd signals after transmission up to 207 km of SSMF. △ Less

Submitted 1 August, 2017; originally announced August 2017.

Showing 1–7 of 7 results for author: Perego, M