Search | arXiv e-print repository

doi 10.1145/3626203.3670536

Benchmarking with Supernovae: A Performance Study of the FLASH Code

Authors: Joshua Martin, Catherine Feldman, Eva Siegmann, Tony Curtis, David Carlson, Firat Coskun, Daniel Wood, Raul Gonzalez, Robert J. Harrison, Alan C. Calder

Abstract: Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, mu… ▽ More Astrophysical simulations are computation, memory, and thus energy intensive, thereby requiring new hardware advances for progress. Stony Brook University recently expanded its computing cluster "SeaWulf" with an addition of 94 new nodes featuring Intel Sapphire Rapids Xeon Max series CPUs. We present a performance and power efficiency study of this hardware performed with FLASH: a multi-scale, multi-physics, adaptive mesh-based software instrument. We extend this study to compare performance to that of Stony Brook's Ookami testbed which features ARM-based A64FX-700 processors, and SeaWulf's AMD EPYC Milan and Intel Skylake nodes. Our application is a stellar explosion known as a thermonuclear (Type Ia) supernova and for this 3D problem, FLASH includes operators for hydrodynamics, gravity, and nuclear burning, in addition to routines for the material equation of state. We perform a strong-scaling study with a 220 GB problem size to explore both single- and multi-node performance. Our study explores the performance of different MPI mappings and the distribution of processors across nodes. From these tests, we determined the optimal configuration to balance runtime and energy consumption for our application. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: Accepted to PEARC '24 (Practice and Experience in Advanced Research Computing)

Journal ref: Practice and Experience in Advanced Research Computing 2024 Article no.8

arXiv:2311.04259 [pdf, other]

Ookami: An A64FX Computing Resource

Authors: A. C. Calder, E. Siegmann, C. Feldman, S. Chheda, D. C. Smolarski, F. D. Swesty, A. Curtis, J. Dey, D. Carlson, B. Michalowicz, R. J. Harrison

Abstract: We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We describe the project, provide details about the user base and education/training program, and present highlights from performance studies of two astrophysical sim… ▽ More We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We describe the project, provide details about the user base and education/training program, and present highlights from performance studies of two astrophysical simulation codes. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 9 pages, 3 figures, submitted to the Proceedings of 15th International Conference on Numerical Modeling of Space Plasma Flows

arXiv:2309.04652 [pdf, other]

doi 10.1145/3569951.3597583

A Further Study of Linux Kernel Hugepages on A64FX with FLASH, an Astrophysical Simulation Code

Authors: Catherine Feldman, Smeet Chheda, Alan C. Calder, Eva Siegmann, John Dey, Tony Curtis, Robert J. Harrison

Abstract: We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but fu… ▽ More We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 10 pages, 2 figures, 7 tables. Proceedings for Practice and Experience in Advanced Research Computing (PEARC '23), July 23--27, 2023, Portland, OR, USA

ACM Class: C.1.4; I.6.0; J.2

Journal ref: Practice and Experience in Advanced Research Computing (PEARC '23). Association for Computing Machinery, New York, NY, USA, 186-195. (July 2023)

arXiv:2207.13685 [pdf, ps, other]

On Using Linux Kernel Huge Pages with FLASH, an Astrophysical Simulation Code

Authors: Alan C. Calder, Catherine Feldman, Eva Siegmann, John Dey, Anthony Curtis, Smeet Chheda, Robert J. Harrison

Abstract: We present efforts at improving the performance of FLASH, a multi-scale, multi-physics simulation code principally for astrophysical applications, by using huge pages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. We explored options for enabling the use of huge pages with… ▽ More We present efforts at improving the performance of FLASH, a multi-scale, multi-physics simulation code principally for astrophysical applications, by using huge pages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. We explored options for enabling the use of huge pages with several compilers, but we were only able to successfully use huge pages when compiling with the Fujitsu compiler. The use of huge pages substantially reduced the number of translation lookaside buffer misses, but overall performance gains were marginal. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: 6 pages, 1 figure, accepted to Embracing Arm for HPC, An IEEE Cluster 2022 Workshop

arXiv:2207.13251 [pdf, ps, other]

Performance of an Astrophysical Radiation Hydrodynamics Code under Scalable Vector Extension Optimization

Authors: Dennis C. Smolarski, F. Douglas Swesty, Alan C. Calder

Abstract: We present results of a performance study of an astrophysical radiation hydrodynamics code, V2D, on the Arm-based A64FX processor developed by Fujitsu. The code solves sparse linear systems, a task for which the A64FX architecture should be well suited. We performed the performance analysis study on Ookami, an Apollo 80 platform utilizing the A64FX processor. We explored several compilers and perf… ▽ More We present results of a performance study of an astrophysical radiation hydrodynamics code, V2D, on the Arm-based A64FX processor developed by Fujitsu. The code solves sparse linear systems, a task for which the A64FX architecture should be well suited. We performed the performance analysis study on Ookami, an Apollo 80 platform utilizing the A64FX processor. We explored several compilers and performance analysis packages and found the code did not perform as expected under scalable vector extension optimization, suggesting that a "deeper dive" into analyzing the code is worthwhile. However, a simple driver program that exercised basic sparse linear algebra routines used by V2D did show significant speedup with the use of the scalable vector extension optimization. We present the initial results from the study which used V2D on a relatively simple test problem that emphasized the repeated solution of sparse linear systems. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: 4 pages, 1 figure, accepted for EAHPC-2022 - Embracing Arm for High Performance Computing Workshop, An IEEE Cluster 2022 Workshop

arXiv:2106.08987 [pdf, other]

doi 10.1145/3437359.3465578

Ookami: Deployment and Initial Experiences

Authors: Andrew Burford, Alan C. Calder, David Carlson, Barbara Chapman, Firat CoŞKun, Tony Curtis, Catherine Feldman, Robert J. Harrison, Yan Kang, Benjamin Michalow-Icz, Eric Raut, Eva Siegmann, Daniel G. Wood, Robert L. Deleon, Mathew Jones, Nikolay A. Simakov, Joseph P. White, Dossay Oryspayev

Abstract: Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vec… ▽ More Ookami is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu in collaboration with RIKΞN for the Japanese path to exascale computing, as deployed in Fugaku, the fastest computer in the world. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. We review relevant technology and system details, and the main body of the paper focuses on initial experiences with the hardware and software ecosystem for micro-benchmarks, mini-apps, and full applications, and starts to answer questions about where such technologies fit into the NSF ecosystem. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: 14 pages, 7 figures, PEARC '21: Practice and Experience in Advanced Research Computing, July 18--22, 2021, Boston, MA, USA

arXiv:1911.06359 [pdf, other]

doi 10.1007/978-3-030-40943-2_17

Twitter Watch: Leveraging Social Media to Monitor and Predict Collective-Efficacy of Neighborhoods

Authors: Moniba Keymanesh, Saket Gurukar, Bethany Boettner, Christopher Browning, Catherine Calder, Srinivasan Parthasarathy

Abstract: Sociologists associate the spatial variation of crime within an urban setting, with the concept of collective efficacy. The collective efficacy of a neighborhood is defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good. Sociologists measure collective efficacy by conducting survey studies designed to measure individuals' perception of… ▽ More Sociologists associate the spatial variation of crime within an urban setting, with the concept of collective efficacy. The collective efficacy of a neighborhood is defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good. Sociologists measure collective efficacy by conducting survey studies designed to measure individuals' perception of their community. In this work, we employ the curated data from a survey study (ground truth) and examine the effectiveness of substituting costly survey questionnaires with proxies derived from social media. We enrich a corpus of tweets mentioning a local venue with several linguistic and topological features. We then propose a pairwise learning to rank model with the goal of identifying a ranking of neighborhoods that is similar to the ranking obtained from the ground truth collective efficacy values. In our experiments, we find that our generated ranking of neighborhoods achieves 0.77 Kendall tau-x ranking agreement with the ground truth ranking. Overall, our results are up to 37% better than traditional baselines. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: 10 pages, 7 figures

Journal ref: Complex Networks XI 2020

arXiv:1712.08641 [pdf, other]

The Geometry of Continuous Latent Space Models for Network Data

Authors: Anna L. Smith, Dena M. Asta, Catherine A. Calder

Abstract: We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding n… ▽ More We review the class of continuous latent space (statistical) models for network data, paying particular attention to the role of the geometry of the latent space. In these models, the presence/absence of network dyadic ties are assumed to be conditionally independent given the dyads? unobserved positions in a latent space. In this way, these models provide a probabilistic framework for embedding network nodes in a continuous space equipped with a geometry that facilitates the description of dependence between random dyadic ties. Specifically, these models naturally capture homophilous tendencies and triadic clustering, among other common properties of observed networks. In addition to reviewing the literature on continuous latent space models from a geometric perspective, we highlight the important role the geometry of the latent space plays on properties of networks arising from these models via intuition and simulation. Finally, we discuss results from spectral graph theory that allow us to explore the role of the geometry of the latent space, independent of network size. We conclude with conjectures about how these results might be used to infer the appropriate latent space geometry from observed networks. △ Less

Submitted 25 March, 2019; v1 submitted 22 December, 2017; originally announced December 2017.

arXiv:1509.03271 [pdf, other]

Empirical Reference Distributions for Networks of Different Size

Authors: Anna Smith, Catherine A. Calder, Christopher R. Browning

Abstract: Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although "normalized" versions of some network statistics exist, we demonstrate via simulation why direct comparis… ▽ More Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although "normalized" versions of some network statistics exist, we demonstrate via simulation why direct comparison of raw and normalized statistics is often inappropriate. We examine a recent suggestion to normalize network statistics relative to Erdos-Renyi random graphs and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of co-location networks derived from the Los Angeles Family and Neighborhood Survey activity location data. △ Less

Submitted 4 March, 2016; v1 submitted 10 September, 2015; originally announced September 2015.

arXiv:1406.5954 [pdf, other]

Bilinear Mixed-Effects Models for Affiliation Networks

Authors: Yanan Jia, Catherine A. Calder, Christopher R. Browning

Abstract: An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Although networks describe a variety of consequential social structures, statistical methods for studying affiliation networks are less well developed than methods for studying one-mode, or actor-actor, networks. O… ▽ More An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Although networks describe a variety of consequential social structures, statistical methods for studying affiliation networks are less well developed than methods for studying one-mode, or actor-actor, networks. One way to analyze affiliation networks is to consider one-mode network matrices that are derived from an affiliation network, but this approach may lead to the loss of important structural features of the data. The most comprehensive approach is to study both actors and events simultaneously. In this paper, we extend the bilinear mixed-effects model, a type of latent space model developed for one-mode networks, to the affiliation network setting by considering the dependence patterns in the interactions between actors and events and describe a Markov chain Monte Carlo algorithm for Bayesian inference. We use our model to explore patterns in extracurricular activity membership of students in a racially-diverse high school in a Midwestern metropolitan area. Using techniques from spatial point pattern analysis, we show how our model can provide insight into patterns of racial segregation in the voluntary extracurricular activity participation profiles of adolescents. △ Less

Submitted 10 June, 2015; v1 submitted 23 June, 2014; originally announced June 2014.

Showing 1–10 of 10 results for author: Calder, C