-
Benchmarking with Supernovae: A Performance Study of the FLASH Code
Authors:
Joshua Martin,
Catherine Feldman,
Eva Siegmann,
Tony Curtis,
David Carlson,
Firat Coskun,
Daniel Wood,
Raul Gonzalez,
Robert J. Harrison,
Alan C. Calder
Abstract:
Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, mu…
▽ More
Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, multi-physics, adaptive mesh-based software instrument. We extend this study to compare performance to that of Stony Brook's Ookami testbed which features ARM-based A64FX-700 processors, and SeaWulf's AMD EPYC Milan and Intel Skylake nodes. Our application is a stellar explosion known as a thermonuclear (Type Ia) supernova and for this 3D problem, FLASH includes operators for hydrodynamics, gravity, and nuclear burning, in addition to routines for the material equation of state. We perform a strong-scaling study with a 220 GB problem size to explore both single- and multi-node performance. Our study explores the performance of different MPI mappings and the distribution of processors across nodes. From these tests, we determined the optimal configuration to balance runtime and energy consumption for our application.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Ookami: An A64FX Computing Resource
Authors:
A. C. Calder,
E. Siegmann,
C. Feldman,
S. Chheda,
D. C. Smolarski,
F. D. Swesty,
A. Curtis,
J. Dey,
D. Carlson,
B. Michalowicz,
R. J. Harrison
Abstract:
We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We describe the project, provide details about the user base and education/training program, and present highlights from performance studies of two astrophysical sim…
▽ More
We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We describe the project, provide details about the user base and education/training program, and present highlights from performance studies of two astrophysical simulation codes.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code
Authors:
Catherine Feldman,
Smeet Chheda,
Alan C. Calder,
Eva Siegmann,
John Dey,
Tony Curtis,
Robert J. Harrison
Abstract:
We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but fu…
▽ More
We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
On Using Linux Kernel Huge Pages with FLASH, an Astrophysical Simulation Code
Authors:
Alan C. Calder,
Catherine Feldman,
Eva Siegmann,
John Dey,
Anthony Curtis,
Smeet Chheda,
Robert J. Harrison
Abstract:
We present efforts at improving the performance of FLASH, a multi-scale, multi-physics simulation code principally for astrophysical applications, by using huge pages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. We explored options for enabling the use of huge pages with…
▽ More
We present efforts at improving the performance of FLASH, a multi-scale, multi-physics simulation code principally for astrophysical applications, by using huge pages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. We explored options for enabling the use of huge pages with several compilers, but we were only able to successfully use huge pages when compiling with the Fujitsu compiler. The use of huge pages substantially reduced the number of translation lookaside buffer misses, but overall performance gains were marginal.
△ Less
Submitted 27 July, 2022;
originally announced July 2022.
-
Performance of an Astrophysical Radiation Hydrodynamics Code under Scalable Vector Extension Optimization
Authors:
Dennis C. Smolarski,
F. Douglas Swesty,
Alan C. Calder
Abstract:
We present results of a performance study of an astrophysical radiation hydrodynamics code, V2D, on the Arm-based A64FX processor developed by Fujitsu. The code solves sparse linear systems, a task for which the A64FX architecture should be well suited. We performed the performance analysis study on Ookami, an Apollo 80 platform utilizing the A64FX processor. We explored several compilers and perf…
▽ More
We present results of a performance study of an astrophysical radiation hydrodynamics code, V2D, on the Arm-based A64FX processor developed by Fujitsu. The code solves sparse linear systems, a task for which the A64FX architecture should be well suited. We performed the performance analysis study on Ookami, an Apollo 80 platform utilizing the A64FX processor. We explored several compilers and performance analysis packages and found the code did not perform as expected under scalable vector extension optimization, suggesting that a "deeper dive" into analyzing the code is worthwhile. However, a simple driver program that exercised basic sparse linear algebra routines used by V2D did show significant speedup with the use of the scalable vector extension optimization. We present the initial results from the study which used V2D on a relatively simple test problem that emphasized the repeated solution of sparse linear systems.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Ookami: Deployment and Initial Experiences
Authors:
Andrew Burford,
Alan C. Calder,
David Carlson,
Barbara Chapman,
Firat CoŞKun,
Tony Curtis,
Catherine Feldman,
Robert J. Harrison,
Yan Kang,
Benjamin Michalow-Icz,
Eric Raut,
Eva Siegmann,
Daniel G. Wood,
Robert L. Deleon,
Mathew Jones,
Nikolay A. Simakov,
Joseph P. White,
Dossay Oryspayev
Abstract:
Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vec…
▽ More
Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. We review relevant technology and system details, and the main body of the paper focuses on initial experiences with the hardware and software ecosystem for micro-benchmarks, mini-apps, and full applications, and starts to answer questions about where such technologies fit into the NSF ecosystem.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Twitter Watch: Leveraging Social Media to Monitor and Predict Collective-Efficacy of Neighborhoods
Authors:
Moniba Keymanesh,
Saket Gurukar,
Bethany Boettner,
Christopher Browning,
Catherine Calder,
Srinivasan Parthasarathy
Abstract:
Sociologists associate the spatial variation of crime within an urban setting, with the concept of collective efficacy. The collective efficacy of a neighborhood is defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good. Sociologists measure collective efficacy by conducting survey studies designed to measure individuals' perception of…
▽ More
Sociologists associate the spatial variation of crime within an urban setting, with the concept of collective efficacy. The collective efficacy of a neighborhood is defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good. Sociologists measure collective efficacy by conducting survey studies designed to measure individuals' perception of their community. In this work, we employ the curated data from a survey study (ground truth) and examine the effectiveness of substituting costly survey questionnaires with proxies derived from social media. We enrich a corpus of tweets mentioning a local venue with several linguistic and topological features. We then propose a pairwise learning to rank model with the goal of identifying a ranking of neighborhoods that is similar to the ranking obtained from the ground truth collective efficacy values. In our experiments, we find that our generated ranking of neighborhoods achieves 0.77 Kendall tau-x ranking agreement with the ground truth ranking. Overall, our results are up to 37% better than traditional baselines.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
The Geometry of Continuous Latent Space Models for Network Data
Authors:
Anna L. Smith,
Dena M. Asta,
Catherine A. Calder
Abstract:
We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding n…
▽ More
We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding network nodes in a continuous space equipped with a geometry that facilitates the description of dependence between random dyadic ties. Specifically, these models naturally capture homophilous tendencies and triadic clustering, among other common properties of observed networks. In addition to reviewing the literature on continuous latent space models from a geometric perspective, we highlight the important role the geometry of the latent space plays on properties of networks arising from these models via intuition and simulation. Finally, we discuss results from spectral graph theory that allow us to explore the role of the geometry of the latent space, independent of network size. We conclude with conjectures about how these results might be used to infer the appropriate latent space geometry from observed networks.
△ Less
Submitted 25 March, 2019; v1 submitted 22 December, 2017;
originally announced December 2017.
-
Empirical Reference Distributions for Networks of Different Size
Authors:
Anna Smith,
Catherine A. Calder,
Christopher R. Browning
Abstract:
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although "normalized" versions of some network statistics exist, we demonstrate via simulation why direct comparis…
▽ More
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although "normalized" versions of some network statistics exist, we demonstrate via simulation why direct comparison of raw and normalized statistics is often inappropriate. We examine a recent suggestion to normalize network statistics relative to Erdos-Renyi random graphs and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of co-location networks derived from the Los Angeles Family and Neighborhood Survey activity location data.
△ Less
Submitted 4 March, 2016; v1 submitted 10 September, 2015;
originally announced September 2015.
-
Bilinear Mixed-Effects Models for Affiliation Networks
Authors:
Yanan Jia,
Catherine A. Calder,
Christopher R. Browning
Abstract:
An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Although networks describe a variety of consequential social structures, statistical methods for studying affiliation networks are less well developed than methods for studying one-mode, or actor-actor, networks. O…
▽ More
An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Although networks describe a variety of consequential social structures, statistical methods for studying affiliation networks are less well developed than methods for studying one-mode, or actor-actor, networks. One way to analyze affiliation networks is to consider one-mode network matrices that are derived from an affiliation network, but this approach may lead to the loss of important structural features of the data. The most comprehensive approach is to study both actors and events simultaneously. In this paper, we extend the bilinear mixed-effects model, a type of latent space model developed for one-mode networks, to the affiliation network setting by considering the dependence patterns in the interactions between actors and events and describe a Markov chain Monte Carlo algorithm for Bayesian inference. We use our model to explore patterns in extracurricular activity membership of students in a racially-diverse high school in a Midwestern metropolitan area. Using techniques from spatial point pattern analysis, we show how our model can provide insight into patterns of racial segregation in the voluntary extracurricular activity participation profiles of adolescents.
△ Less
Submitted 10 June, 2015; v1 submitted 23 June, 2014;
originally announced June 2014.