-
Ripples spreading across the Galactic disc: Interplay of direct and indirect effects of the Sagittarius dwarf impact
Authors:
Tetsuro Asano,
Michiko S. Fujii,
Junichi Baba,
Simon Portegies Zwart,
Jeroen Bédorf
Abstract:
Gaia data have revealed vertically asymmetric phase-space structures in the Milky Way (MW) disc, such as phase spirals, indicating vertical oscillations. These oscillations exhibit two distinct modes: the bending mode and the breathing mode, associated with one-arm and two-arm phase spirals, respectively. This study aims to explore the excitation mechanisms of the bending and breathing modes and t…
▽ More
Gaia data have revealed vertically asymmetric phase-space structures in the Milky Way (MW) disc, such as phase spirals, indicating vertical oscillations. These oscillations exhibit two distinct modes: the bending mode and the breathing mode, associated with one-arm and two-arm phase spirals, respectively. This study aims to explore the excitation mechanisms of the bending and breathing modes and their subsequent evolution in the MW disc, focusing on the interplay between direct perturbations from the Sagittarius dwarf galaxy and indirect contributions from tidally induced spiral arms. We perform high-resolution $N$-body simulations to model the interaction between an MW-like disc galaxy and a Sagittarius dwarf-like satellite. These simulations resolve fine phase-space structures, enabling analysis of the bending and breathing modes at both macroscopic (global bending and breathing waves) and microscopic (local phase spirals) scales. Our simulations demonstrate that the satellite's perturbation directly excites the bending mode and induces spiral arms in the galactic disc. These spiral arms excite the breathing mode, making it an indirect consequence of the satellite interaction. Initially, the bending mode dominates, but it rapidly decays due to horizontal mixing. In contrast, the breathing mode persists for a longer duration, sustained by the spiral arms, leading to a transition from a bending-dominated to a breathing-dominated state. This transition progresses faster in the inner galaxy than in the outer regions. The simulations reproduce the one-arm phase spiral observed in the solar neighbourhood and reveal two-arm phase spirals, particularly in the inner galaxy, associated with spiral arm-induced breathing modes. Our findings highlight the combined effects of direct satellite perturbations and indirect spiral arm dynamics in shaping the vertical structure of the MW disc.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Impact of bar resonances in the velocity-space distribution of the solar neighbourhood stars in a self-consistent $N$-body Galactic disc simulation
Authors:
Tetsuro Asano,
Michiko S. Fujii,
Junichi Baba,
Jeroen Bédorf,
Elena Sellentin,
Simon Portegies Zwart
Abstract:
The velocity-space distribution of the solar neighbourhood stars shows complex substructures. Most of the previous studies use static potentials to investigate their origins. Instead we use a self-consistent $N$-body model of the Milky Way, whose potential is asymmetric and evolves with time. In this paper, we quantitatively evaluate the similarities of the velocity-space distributions in the $N$-…
▽ More
The velocity-space distribution of the solar neighbourhood stars shows complex substructures. Most of the previous studies use static potentials to investigate their origins. Instead we use a self-consistent $N$-body model of the Milky Way, whose potential is asymmetric and evolves with time. In this paper, we quantitatively evaluate the similarities of the velocity-space distributions in the $N$-body model and that of the solar neighbourhood, using Kullback-Leibler divergence (KLD). The KLD analysis shows the time evolution and spatial variation of the velocity-space distribution. The KLD fluctuates with time, which indicates the velocity-space distribution at a fixed position is not always similar to that of the solar neighbourhood. Some positions show velocity-space distributions with small KLDs (high similarities) more frequently than others. One of them locates at $(R,φ)=(8.2\;\mathrm{kpc}, 30^{\circ})$, where $R$ and $φ$ are the distance from the galactic centre and the angle with respect to the bar's major axis, respectively. The detection frequency is higher in the inter-arm regions than in the arm regions. In the velocity maps with small KLDs, we identify the velocity-space substructures, which consist of particles trapped in bar resonances. The bar resonances have significant impact on the stellar velocity-space distribution even though the galactic potential is not static.
△ Less
Submitted 23 May, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Resolving local and global kinematic signatures of satellite mergers with billion particle simulations
Authors:
Jason A. S. Hunt,
Ioana A. Stelea,
Kathryn V. Johnston,
Suroor S. Gandhi,
Chervin F. P. Laporte,
Jeroen Bedorf
Abstract:
In this work we present two new $\sim10^9$ particle self-consistent simulations of the merger of a Sagittarius-like dwarf galaxy with a Milky Way-like disc galaxy. One model is a violent merger creating a thick disc, and a Gaia-Enceladus/Sausage like remnant. The other is a highly stable disc which we use to illustrate how the improved phase space resolution allows us to better examine the formati…
▽ More
In this work we present two new $\sim10^9$ particle self-consistent simulations of the merger of a Sagittarius-like dwarf galaxy with a Milky Way-like disc galaxy. One model is a violent merger creating a thick disc, and a Gaia-Enceladus/Sausage like remnant. The other is a highly stable disc which we use to illustrate how the improved phase space resolution allows us to better examine the formation and evolution of structures that have been observed in small, local volumes in the Milky Way, such as the $z-v_z$ phase spiral and clustering in the $v_{\mathrm{R}}-v_φ$ plane when compared to previous works. The local $z-v_z$ phase spirals are clearly linked to the global asymmetry across the disc: we find both 2-armed and 1-armed phase spirals, which are related to breathing and bending behaviors respectively. Hercules-like moving groups are common, clustered in $v_{\mathrm{R}}-v_φ$ in local data samples in the simulation. These groups migrate outwards from the inner galaxy, matching observed metallicity trends even in the absence of a galactic bar. We currently release the best fitting `present day' merger snapshots along with the unperturbed galaxies for comparison.
△ Less
Submitted 11 September, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
DeepGalaxy: Deducing the Properties of Galaxy Mergers from Images Using Deep Neural Networks
Authors:
Maxwell X. Cai,
Jeroen Bédorf,
Vikram A. Saletore,
Valeriu Codreanu,
Damian Podareanu,
Adel Chaibi,
Penny X. Qian
Abstract:
Galaxy mergers, the dynamical process during which two galaxies collide, are among the most spectacular phenomena in the Universe. During this process, the two colliding galaxies are tidally disrupted, producing significant visual features that evolve as a function of time. These visual features contain valuable clues for deducing the physical properties of the galaxy mergers. In this work, we pro…
▽ More
Galaxy mergers, the dynamical process during which two galaxies collide, are among the most spectacular phenomena in the Universe. During this process, the two colliding galaxies are tidally disrupted, producing significant visual features that evolve as a function of time. These visual features contain valuable clues for deducing the physical properties of the galaxy mergers. In this work, we propose DeepGalaxy, a visual analysis framework trained to predict the physical properties of galaxy mergers based on their morphology. Based on an encoder-decoder architecture, DeepGalaxy encodes the input images to a compressed latent space $z$, and determines the similarity of images according to the latent-space distance. DeepGalaxy consists of a fully convolutional autoencoder (FCAE) which generates activation maps at its 3D latent-space, and a variational autoencoder (VAE) which compresses the activation maps into a 1D vector, and a classifier that generates labels from the activation maps. The backbone of the FCAE can be fully customized according to the complexity of the images. DeepGalaxy demonstrates excellent scaling performance on parallel machines. On the Endeavour supercomputer, the scaling efficiency exceeds 0.93 when trained on 128 workers, and it maintains above 0.73 when trained with 512 workers. Without having to carry out expensive numerical simulations, DeepGalaxy makes inferences of the physical properties of galaxy mergers directly from images, and thereby achieves a speedup factor of $\sim 10^5$.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Trimodal structure of Hercules stream explained by originating from bar resonances
Authors:
Tetsuro Asano,
Michiko S. Fujii,
Junichi Baba,
Jeroen Bédorf,
Elena Sellentin,
Simon Portegies Zwart
Abstract:
Gaia Data Release 2 revealed detailed structures of nearby stars in phase space. These include the Hercules stream, whose origin is still debated. Most of the previous numerical studies conjectured that the observed structures originate from orbits in resonance with the bar, based on static potential models for the Milky Way. We, in contrast, approach the problem via a self-consistent, dynamic, an…
▽ More
Gaia Data Release 2 revealed detailed structures of nearby stars in phase space. These include the Hercules stream, whose origin is still debated. Most of the previous numerical studies conjectured that the observed structures originate from orbits in resonance with the bar, based on static potential models for the Milky Way. We, in contrast, approach the problem via a self-consistent, dynamic, and morphologically well-resolved model, namely a full $N$-body simulation of the Milky Way. Our simulation comprises about 5.1 billion particles in the galactic stellar bulge, bar, disk, and dark-matter halo and is evolved to 10 Gyr. Our model's disk component is composed of 200 million particles, and its simulation snapshots are stored every 10 Myr, enabling us to resolve and classify resonant orbits of representative samples of stars. After choosing the Sun's position in the simulation, we compare the distribution of stars in its neighborhood with Gaia's astrometric data, thereby establishing the role of identified resonantly trapped stars in the formation of Hercules-like structures. From our orbital spectral-analysis we identify multiple, especially higher order resonances. Our results suggest that the Hercules stream is dominated by the 4:1 and 5:1 outer Lindblad and corotation resonances. In total, this yields a trimodal structure of the Hercules stream. From the relation between resonances and ridges in phase space, our model favored a slow pattern speed of the Milky-Way bar (40--45 $\mathrm{km \; s^{-1} \; kpc^{-1}}$).
△ Less
Submitted 15 September, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Bonsai-SPH: A GPU accelerated astrophysical Smoothed Particle Hydrodynamics code
Authors:
Jeroen Bédorf,
Simon Portegies Zwart
Abstract:
We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit (GPU) accelerators which enables researchers to take advantage of these powerful computational resources. Bonsa-SPH produces simulation results comparable with stat…
▽ More
We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit (GPU) accelerators which enables researchers to take advantage of these powerful computational resources. Bonsa-SPH produces simulation results comparable with state-of-the-art, CPU based, codes, but using an order of magnitude less computation time. The code is freely available online and the details are described in this work.
△ Less
Submitted 14 February, 2020; v1 submitted 16 September, 2019;
originally announced September 2019.
-
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
Authors:
Ammar Ahmad Awan,
Jeroen Bedorf,
Ching-Hsiang Chu,
Hari Subramoni,
Dhabaleswar K. Panda
Abstract:
TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google…
▽ More
TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this paper, we provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters including the Piz Daint system (6 on Top500). We perform experiments to gain novel insights along the following vectors: 1) Application-level scalability of DNN training, 2) Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on these experiments, we present two key insights: 1) Overall, No-gRPC designs achieve better performance compared to gRPC-based approaches for most configurations, and 2) The performance of No-gRPC is heavily influenced by the gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware MPI Allreduce design that exploits CUDA kernels and pointer caching to perform large reductions efficiently. Our proposed designs offer 5-17X better performance than NCCL2 for small and medium messages, and reduces latency by 29% for large messages. The proposed optimizations help Horovod-MPI to achieve approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint cluster.
△ Less
Submitted 25 October, 2018;
originally announced October 2018.
-
Modeling the Milky Way as a Dry Galaxy
Authors:
Michiko S. Fujii,
Jeroen Bédorf,
Junichi Baba,
Simon Portegies Zwart
Abstract:
We construct a model for the Milky Way Galaxy composed of a stellar disc and bulge embedded in a dark-matter halo. All components are modelled as $N$-body systems with up to 8 billion equal-mass particles and integrated up to an age of 10\,Gyr. We find that net angular-momentum of the dark-matter halo with a spin parameter of $λ=0.06$ is required to form a relatively short bar ($\sim 4$\,kpc) with…
▽ More
We construct a model for the Milky Way Galaxy composed of a stellar disc and bulge embedded in a dark-matter halo. All components are modelled as $N$-body systems with up to 8 billion equal-mass particles and integrated up to an age of 10\,Gyr. We find that net angular-momentum of the dark-matter halo with a spin parameter of $λ=0.06$ is required to form a relatively short bar ($\sim 4$\,kpc) with a high pattern speed (40--50\,km\,s$^{-1}$). By comparing our model with observations of the Milky Way Galaxy, we conclude that a disc mass of $\sim 3.7\times10^{10}M_{\odot}$ and an initial bulge scale length and velocity of $\sim 1$\,kpc and $\sim 300$\,km\,s$^{-1}$, respectively, fit best to the observations. The disc-to-total mass fraction ($f_{\rm d}$) appears to be an important parameter for the evolution of the Galaxy and models with $f_{\rm d}\sim 0.45$ are most similar to the Milky Way Galaxy. In addition, we compare the velocity distribution in the solar neighbourhood in our simulations with observations in the Milky Way Galaxy. In our simulations the observed gap in the velocity distribution, which is expected to be caused by the outer Lindblad resonance (the so-called Hercules stream), appears to be a time-dependent structure. The velocity distribution changes on a time scale of 20--30\,Myr and therefore it is difficult to estimate the pattern speed of the bar from the shape of the local velocity distribution alone.
△ Less
Submitted 10 October, 2018; v1 submitted 26 July, 2018;
originally announced July 2018.
-
The dynamics of stellar disks in live dark-matter halos
Authors:
Michiko S. Fujii,
Jeroen Bédorf,
Junichi Baba,
Simon Portegies Zwart
Abstract:
Recent developments in computer hardware and software enable researchers to simulate the self-gravitating evolution of galaxies at a resolution comparable to the actual number of stars. Here we present the results of a series of such simulations. We performed $N$-body simulations of disk galaxies with between 100 and 500 million particles over a wide range of initial conditions. Our calculations i…
▽ More
Recent developments in computer hardware and software enable researchers to simulate the self-gravitating evolution of galaxies at a resolution comparable to the actual number of stars. Here we present the results of a series of such simulations. We performed $N$-body simulations of disk galaxies with between 100 and 500 million particles over a wide range of initial conditions. Our calculations include a live bulge, disk, and dark matter halo, each of which is represented by self-gravitating particles in the $N$-body code. The simulations are performed using the gravitational $N$-body tree-code Bonsai running on the Piz Daint supercomputer. We find that the time scale over which the bar forms increases exponentially with decreasing disk-mass fraction and that the bar formation epoch exceeds a Hubble time when the disk-mass fraction is $\sim0.35$. These results can be explained with the swing-amplification theory. The condition for the formation of $m=2$ spirals is consistent with that for the formation of the bar, which is also an $m=2$ phenomenon. We further argue that the non-barred grand-design spiral galaxies are transitional, and that they evolve to barred galaxies on a dynamical timescale. We also confirm that the disk-mass fraction and shear rate are important parameters for the morphology of disk galaxies. The former affects the number of spiral arms and the bar formation epoch, and the latter determines the pitch angle of the spiral arms.
△ Less
Submitted 16 March, 2018; v1 submitted 30 November, 2017;
originally announced December 2017.
-
The origin of interstellar asteroidal objects like 1I/2017 U1 'Oumuamua
Authors:
Simon Portegies Zwart,
Santiago Torres,
Inti Pelupessy,
Jeroen Bedorf,
Maxwell Cai
Abstract:
We study the origin of the interstellar object 1I/2017 U1 'Oumuamua by juxtaposing estimates based on the observations with simulations. We speculate that objects like 'Oumuamua are formed in the debris disc as left over from the star and planet formation process, and subsequently liberated. The liberation process is mediated either by interaction with other stars in the parental star-cluster, by…
▽ More
We study the origin of the interstellar object 1I/2017 U1 'Oumuamua by juxtaposing estimates based on the observations with simulations. We speculate that objects like 'Oumuamua are formed in the debris disc as left over from the star and planet formation process, and subsequently liberated. The liberation process is mediated either by interaction with other stars in the parental star-cluster, by resonant interactions within the planetesimal disc or by the relatively sudden mass loss when the host star becomes a compact object. Integrating backward in time in the Galactic potential together with stars from the Gaia-TGAS catalogue we find that about 1.3Myr ago 'Oumuamua passed the nearby star HIP 17288 within a mean distance of $1.3$pc. By comparing nearby observed L-dwarfs with simulations of the Galaxy we conclude that the kinematics of 'Oumuamua is consistent with relatively young objects of $1.1$--$1.7$Gyr. We just met 'Oumuamua by chance, and with a derived mean Galactic density of $\sim 3\times 10^{5}$ similarly sized objects within 100\,au from the Sun or $\sim 10^{14}$ per cubic parsec we expect about 2 to 12 such visitors per year within 1au from the Sun.
△ Less
Submitted 11 May, 2018; v1 submitted 9 November, 2017;
originally announced November 2017.
-
Sapporo2: A versatile direct $N$-body library
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Simon Portegies Zwart
Abstract:
Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulation…
▽ More
Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles ($N < 100$) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.
△ Less
Submitted 14 October, 2015;
originally announced October 2015.
-
24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Michiko S. Fujii,
Keigo Nitadori,
Tomoaki Ishiyama,
Simon Portegies Zwart
Abstract:
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This i…
▽ More
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
Computational Gravitational Dynamics with Modern Numerical Accelerators
Authors:
Simon Portegies Zwart,
Jeroen Bédorf
Abstract:
We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the performance characteristics diff…
▽ More
We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the performance characteristics differ, hybridizing the two algorithms is essential when simulating a large $N$-body system with high-density structures containing few particles, and with low-density structures containing many particles. We demonstrate how this can be realized by splitting the underlying Hamiltonian, and we subsequently demonstrate the efficiency and accuracy of the hybrid code by simulating a group of 11 merging galaxies with massive black holes in the nuclei.
△ Less
Submitted 18 September, 2014;
originally announced September 2014.
-
The Effect of Many Minor Mergers on the Size Growth of Compact Quiescent Galaxies
Authors:
Jeroen Bédorf,
Simon Portegies Zwart
Abstract:
Massive galaxies with a half-mass radius <~ 1kpc are observed in the early universe (z~>2), but not in the local universe. In the local universe similar-mass (within a factor of two) galaxies tend to be a factor of 4 to 5 larger. Dry minor mergers are known to drive the evolution of the size of a galaxy without much increasing the mass, but it is unclear if the growth in size is sufficient to expl…
▽ More
Massive galaxies with a half-mass radius <~ 1kpc are observed in the early universe (z~>2), but not in the local universe. In the local universe similar-mass (within a factor of two) galaxies tend to be a factor of 4 to 5 larger. Dry minor mergers are known to drive the evolution of the size of a galaxy without much increasing the mass, but it is unclear if the growth in size is sufficient to explain the observations. We test the hypothesis that galaxies grow through dry minor mergers by simulating merging galaxies with mass ratios of q=1:1 (equal mass) to q=1:160. In our N-body simulations the total mass of the parent galaxy doubles. We confirm that major mergers do not cause a sufficient growth in size. The observation can be explained with mergers with a mass ratio of q=1:5--1:10. Smaller mass ratios cause a more dramatic growth in size, up to a factor of ~17 for mergers with a mass ratio of 1:80. For relatively massive minor mergers q ~> 1:20 the mass of the incoming child galaxies tend to settle in the halo of the parent galaxy. This is caused by the tidal stripping of the child galaxies by the time they enter the central portion of the parent. When the accretion of minor galaxies becomes more continuous, when q <~ 1:40, the foreign mass tends to concentrate more in the central region of the parent galaxy. We speculate that this is caused by dynamic interactions between the child galaxies inside the merger remnant and the longer merging times when the difference in mass is larger. These interactions cause dynamical heating which results in accretion of mass inside the galaxy core and a reduction of the parent's circular velocity and density.
△ Less
Submitted 28 January, 2013;
originally announced January 2013.
-
A pilgrimage to gravity on GPUs
Authors:
Jeroen Bédorf,
Simon Portegies Zwart
Abstract:
In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations…
▽ More
In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.
△ Less
Submitted 13 April, 2012;
originally announced April 2012.
-
Bonsai: A GPU Tree-Code
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Simon Portegies Zwart
Abstract:
We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an ove…
▽ More
We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an overall performance improvement of more than a factor 20, resulting in a processing rate of more than 2.8 million particles per second.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.
-
A sparse octree gravitational N-body code that runs entirely on the GPU processor
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Simon Portegies Zwart
Abstract:
We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html…
▽ More
We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.
△ Less
Submitted 10 April, 2012; v1 submitted 9 June, 2011;
originally announced June 2011.
-
Gravitational tree-code on graphics processing units: implementation in CUDA
Authors:
Evghenii Gaburov,
Jeroen Bédorf,
Simon Portegies Zwart
Abstract:
We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about…
▽ More
We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of $θ\approx 0.5$. The code has a convenient user interface and is freely available for use\footnote{\tt http://castle.strw.leidenuniv.nl/software/octgrav.html}.
△ Less
Submitted 28 May, 2010;
originally announced May 2010.
-
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA
Authors:
Robert G. Belleman,
Jeroen Bedorf,
Simon Portegies Zwart
Abstract:
We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: t…
▽ More
We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: two direct $N$-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU.
We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the $N$-body system was conserved better than to one in $10^6$ on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For $N \apgt 10^5$ the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af.
△ Less
Submitted 16 July, 2007; v1 submitted 3 July, 2007;
originally announced July 2007.