Search | arXiv e-print repository

Ripples spreading across the Galactic disc: Interplay of direct and indirect effects of the Sagittarius dwarf impact

Authors: Tetsuro Asano, Michiko S. Fujii, Junichi Baba, Simon Portegies Zwart, Jeroen Bédorf

Abstract: Gaia data have revealed vertically asymmetric phase-space structures in the Milky Way (MW) disc, such as phase spirals, indicating vertical oscillations. These oscillations exhibit two distinct modes: the bending mode and the breathing mode, associated with one-arm and two-arm phase spirals, respectively. This study aims to explore the excitation mechanisms of the bending and breathing modes and t… ▽ More Gaia data have revealed vertically asymmetric phase-space structures in the Milky Way (MW) disc, such as phase spirals, indicating vertical oscillations. These oscillations exhibit two distinct modes: the bending mode and the breathing mode, associated with one-arm and two-arm phase spirals, respectively. This study aims to explore the excitation mechanisms of the bending and breathing modes and their subsequent evolution in the MW disc, focusing on the interplay between direct perturbations from the Sagittarius dwarf galaxy and indirect contributions from tidally induced spiral arms. We perform high-resolution $N$-body simulations to model the interaction between an MW-like disc galaxy and a Sagittarius dwarf-like satellite. These simulations resolve fine phase-space structures, enabling analysis of the bending and breathing modes at both macroscopic (global bending and breathing waves) and microscopic (local phase spirals) scales. Our simulations demonstrate that the satellite's perturbation directly excites the bending mode and induces spiral arms in the galactic disc. These spiral arms excite the breathing mode, making it an indirect consequence of the satellite interaction. Initially, the bending mode dominates, but it rapidly decays due to horizontal mixing. In contrast, the breathing mode persists for a longer duration, sustained by the spiral arms, leading to a transition from a bending-dominated to a breathing-dominated state. This transition progresses faster in the inner galaxy than in the outer regions. The simulations reproduce the one-arm phase spiral observed in the solar neighbourhood and reveal two-arm phase spirals, particularly in the inner galaxy, associated with spiral arm-induced breathing modes. Our findings highlight the combined effects of direct satellite perturbations and indirect spiral arm dynamics in shaping the vertical structure of the MW disc. △ Less

Submitted 21 January, 2025; originally announced January 2025.

Comments: 22 pages, 16 figures, submitted to A&A

arXiv:2112.00765 [pdf, other]

doi 10.1093/mnras/stac1379

Impact of bar resonances in the velocity-space distribution of the solar neighbourhood stars in a self-consistent $N$-body Galactic disc simulation

Authors: Tetsuro Asano, Michiko S. Fujii, Junichi Baba, Jeroen Bédorf, Elena Sellentin, Simon Portegies Zwart

Abstract: The velocity-space distribution of the solar neighbourhood stars shows complex substructures. Most of the previous studies use static potentials to investigate their origins. Instead we use a self-consistent $N$-body model of the Milky Way, whose potential is asymmetric and evolves with time. In this paper, we quantitatively evaluate the similarities of the velocity-space distributions in the $N$-… ▽ More The velocity-space distribution of the solar neighbourhood stars shows complex substructures. Most of the previous studies use static potentials to investigate their origins. Instead we use a self-consistent $N$-body model of the Milky Way, whose potential is asymmetric and evolves with time. In this paper, we quantitatively evaluate the similarities of the velocity-space distributions in the $N$-body model and that of the solar neighbourhood, using Kullback-Leibler divergence (KLD). The KLD analysis shows the time evolution and spatial variation of the velocity-space distribution. The KLD fluctuates with time, which indicates the velocity-space distribution at a fixed position is not always similar to that of the solar neighbourhood. Some positions show velocity-space distributions with small KLDs (high similarities) more frequently than others. One of them locates at $(R,φ)=(8.2\;\mathrm{kpc}, 30^{\circ})$, where $R$ and $φ$ are the distance from the galactic centre and the angle with respect to the bar's major axis, respectively. The detection frequency is higher in the inter-arm regions than in the arm regions. In the velocity maps with small KLDs, we identify the velocity-space substructures, which consist of particles trapped in bar resonances. The bar resonances have significant impact on the stellar velocity-space distribution even though the galactic potential is not static. △ Less

Submitted 23 May, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: 9 pages, 11 figures. Accepted by MNRAS

arXiv:2107.06294 [pdf, other]

doi 10.1093/mnras/stab2580

Resolving local and global kinematic signatures of satellite mergers with billion particle simulations

Authors: Jason A. S. Hunt, Ioana A. Stelea, Kathryn V. Johnston, Suroor S. Gandhi, Chervin F. P. Laporte, Jeroen Bedorf

Abstract: In this work we present two new $\sim10^9$ particle self-consistent simulations of the merger of a Sagittarius-like dwarf galaxy with a Milky Way-like disc galaxy. One model is a violent merger creating a thick disc, and a Gaia-Enceladus/Sausage like remnant. The other is a highly stable disc which we use to illustrate how the improved phase space resolution allows us to better examine the formati… ▽ More In this work we present two new $\sim10^9$ particle self-consistent simulations of the merger of a Sagittarius-like dwarf galaxy with a Milky Way-like disc galaxy. One model is a violent merger creating a thick disc, and a Gaia-Enceladus/Sausage like remnant. The other is a highly stable disc which we use to illustrate how the improved phase space resolution allows us to better examine the formation and evolution of structures that have been observed in small, local volumes in the Milky Way, such as the $z-v_z$ phase spiral and clustering in the $v_{\mathrm{R}}-v_φ$ plane when compared to previous works. The local $z-v_z$ phase spirals are clearly linked to the global asymmetry across the disc: we find both 2-armed and 1-armed phase spirals, which are related to breathing and bending behaviors respectively. Hercules-like moving groups are common, clustered in $v_{\mathrm{R}}-v_φ$ in local data samples in the simulation. These groups migrate outwards from the inner galaxy, matching observed metallicity trends even in the absence of a galactic bar. We currently release the best fitting `present day' merger snapshots along with the unperturbed galaxies for comparison. △ Less

Submitted 11 September, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: 16 pages, 14 figures, accepted by MNRAS

arXiv:2010.11630 [pdf, other]

doi 10.1109/DLS51937.2020.00012

DeepGalaxy: Deducing the Properties of Galaxy Mergers from Images Using Deep Neural Networks

Authors: Maxwell X. Cai, Jeroen Bédorf, Vikram A. Saletore, Valeriu Codreanu, Damian Podareanu, Adel Chaibi, Penny X. Qian

Abstract: Galaxy mergers, the dynamical process during which two galaxies collide, are among the most spectacular phenomena in the Universe. During this process, the two colliding galaxies are tidally disrupted, producing significant visual features that evolve as a function of time. These visual features contain valuable clues for deducing the physical properties of the galaxy mergers. In this work, we pro… ▽ More Galaxy mergers, the dynamical process during which two galaxies collide, are among the most spectacular phenomena in the Universe. During this process, the two colliding galaxies are tidally disrupted, producing significant visual features that evolve as a function of time. These visual features contain valuable clues for deducing the physical properties of the galaxy mergers. In this work, we propose DeepGalaxy, a visual analysis framework trained to predict the physical properties of galaxy mergers based on their morphology. Based on an encoder-decoder architecture, DeepGalaxy encodes the input images to a compressed latent space $z$, and determines the similarity of images according to the latent-space distance. DeepGalaxy consists of a fully convolutional autoencoder (FCAE) which generates activation maps at its 3D latent-space, and a variational autoencoder (VAE) which compresses the activation maps into a 1D vector, and a classifier that generates labels from the activation maps. The backbone of the FCAE can be fully customized according to the complexity of the images. DeepGalaxy demonstrates excellent scaling performance on parallel machines. On the Endeavour supercomputer, the scaling efficiency exceeds 0.93 when trained on 128 workers, and it maintains above 0.73 when trained with 512 workers. Without having to carry out expensive numerical simulations, DeepGalaxy makes inferences of the physical properties of galaxy mergers directly from images, and thereby achieves a speedup factor of $\sim 10^5$. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: 7 pages, 7 figures. Accepted for publication at the 2020 IEEE/ACM Fifth Workshop on Deep Learning on Supercomputers (DLS)

arXiv:2005.14049 [pdf, other]

doi 10.1093/mnras/staa2849

Trimodal structure of Hercules stream explained by originating from bar resonances

Authors: Tetsuro Asano, Michiko S. Fujii, Junichi Baba, Jeroen Bédorf, Elena Sellentin, Simon Portegies Zwart

Abstract: Gaia Data Release 2 revealed detailed structures of nearby stars in phase space. These include the Hercules stream, whose origin is still debated. Most of the previous numerical studies conjectured that the observed structures originate from orbits in resonance with the bar, based on static potential models for the Milky Way. We, in contrast, approach the problem via a self-consistent, dynamic, an… ▽ More Gaia Data Release 2 revealed detailed structures of nearby stars in phase space. These include the Hercules stream, whose origin is still debated. Most of the previous numerical studies conjectured that the observed structures originate from orbits in resonance with the bar, based on static potential models for the Milky Way. We, in contrast, approach the problem via a self-consistent, dynamic, and morphologically well-resolved model, namely a full $N$-body simulation of the Milky Way. Our simulation comprises about 5.1 billion particles in the galactic stellar bulge, bar, disk, and dark-matter halo and is evolved to 10 Gyr. Our model's disk component is composed of 200 million particles, and its simulation snapshots are stored every 10 Myr, enabling us to resolve and classify resonant orbits of representative samples of stars. After choosing the Sun's position in the simulation, we compare the distribution of stars in its neighborhood with Gaia's astrometric data, thereby establishing the role of identified resonantly trapped stars in the formation of Hercules-like structures. From our orbital spectral-analysis we identify multiple, especially higher order resonances. Our results suggest that the Hercules stream is dominated by the 4:1 and 5:1 outer Lindblad and corotation resonances. In total, this yields a trimodal structure of the Hercules stream. From the relation between resonances and ridges in phase space, our model favored a slow pattern speed of the Milky-Way bar (40--45 $\mathrm{km \; s^{-1} \; kpc^{-1}}$). △ Less

Submitted 15 September, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Comments: 11 pages, 9 figures, MNRAS accepted

arXiv:1909.07439 [pdf, other]

Bonsai-SPH: A GPU accelerated astrophysical Smoothed Particle Hydrodynamics code

Authors: Jeroen Bédorf, Simon Portegies Zwart

Abstract: We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit (GPU) accelerators which enables researchers to take advantage of these powerful computational resources. Bonsa-SPH produces simulation results comparable with stat… ▽ More We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit (GPU) accelerators which enables researchers to take advantage of these powerful computational resources. Bonsa-SPH produces simulation results comparable with state-of-the-art, CPU based, codes, but using an order of magnitude less computation time. The code is freely available online and the details are described in this work. △ Less

Submitted 14 February, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: Updated intro and multi-GPU sections. 23 pages, 9 figures. Submission to SciPost

arXiv:1810.11112 [pdf, other]

doi 10.1109/CCGRID.2019.00064

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Authors: Ammar Ahmad Awan, Jeroen Bedorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda

Abstract: TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google… ▽ More TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this paper, we provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters including the Piz Daint system (6 on Top500). We perform experiments to gain novel insights along the following vectors: 1) Application-level scalability of DNN training, 2) Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on these experiments, we present two key insights: 1) Overall, No-gRPC designs achieve better performance compared to gRPC-based approaches for most configurations, and 2) The performance of No-gRPC is heavily influenced by the gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware MPI Allreduce design that exploits CUDA kernels and pointer caching to perform large reductions efficiently. Our proposed designs offer 5-17X better performance than NCCL2 for small and medium messages, and reduces latency by 29% for large messages. The proposed optimizations help Horovod-MPI to achieve approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint cluster. △ Less

Submitted 25 October, 2018; originally announced October 2018.

Comments: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-review

Journal ref: IEEE CCGrid, 2019

arXiv:1807.10019 [pdf, other]

doi 10.1093/mnras/sty2747

Modeling the Milky Way as a Dry Galaxy

Authors: Michiko S. Fujii, Jeroen Bédorf, Junichi Baba, Simon Portegies Zwart

Abstract: We construct a model for the Milky Way Galaxy composed of a stellar disc and bulge embedded in a dark-matter halo. All components are modelled as $N$-body systems with up to 8 billion equal-mass particles and integrated up to an age of 10\,Gyr. We find that net angular-momentum of the dark-matter halo with a spin parameter of $λ=0.06$ is required to form a relatively short bar ($\sim 4$\,kpc) with… ▽ More We construct a model for the Milky Way Galaxy composed of a stellar disc and bulge embedded in a dark-matter halo. All components are modelled as $N$-body systems with up to 8 billion equal-mass particles and integrated up to an age of 10\,Gyr. We find that net angular-momentum of the dark-matter halo with a spin parameter of $λ=0.06$ is required to form a relatively short bar ($\sim 4$\,kpc) with a high pattern speed (40--50\,km\,s$^{-1}$). By comparing our model with observations of the Milky Way Galaxy, we conclude that a disc mass of $\sim 3.7\times10^{10}M_{\odot}$ and an initial bulge scale length and velocity of $\sim 1$\,kpc and $\sim 300$\,km\,s$^{-1}$, respectively, fit best to the observations. The disc-to-total mass fraction ($f_{\rm d}$) appears to be an important parameter for the evolution of the Galaxy and models with $f_{\rm d}\sim 0.45$ are most similar to the Milky Way Galaxy. In addition, we compare the velocity distribution in the solar neighbourhood in our simulations with observations in the Milky Way Galaxy. In our simulations the observed gap in the velocity distribution, which is expected to be caused by the outer Lindblad resonance (the so-called Hercules stream), appears to be a time-dependent structure. The velocity distribution changes on a time scale of 20--30\,Myr and therefore it is difficult to estimate the pattern speed of the bar from the shape of the local velocity distribution alone. △ Less

Submitted 10 October, 2018; v1 submitted 26 July, 2018; originally announced July 2018.

Comments: Accepted by MNRAS, 32 pages, 31 figures

arXiv:1712.00058 [pdf, other]

doi 10.1093/mnras/sty711

The dynamics of stellar disks in live dark-matter halos

Authors: Michiko S. Fujii, Jeroen Bédorf, Junichi Baba, Simon Portegies Zwart

Abstract: Recent developments in computer hardware and software enable researchers to simulate the self-gravitating evolution of galaxies at a resolution comparable to the actual number of stars. Here we present the results of a series of such simulations. We performed $N$-body simulations of disk galaxies with between 100 and 500 million particles over a wide range of initial conditions. Our calculations i… ▽ More Recent developments in computer hardware and software enable researchers to simulate the self-gravitating evolution of galaxies at a resolution comparable to the actual number of stars. Here we present the results of a series of such simulations. We performed $N$-body simulations of disk galaxies with between 100 and 500 million particles over a wide range of initial conditions. Our calculations include a live bulge, disk, and dark matter halo, each of which is represented by self-gravitating particles in the $N$-body code. The simulations are performed using the gravitational $N$-body tree-code Bonsai running on the Piz Daint supercomputer. We find that the time scale over which the bar forms increases exponentially with decreasing disk-mass fraction and that the bar formation epoch exceeds a Hubble time when the disk-mass fraction is $\sim0.35$. These results can be explained with the swing-amplification theory. The condition for the formation of $m=2$ spirals is consistent with that for the formation of the bar, which is also an $m=2$ phenomenon. We further argue that the non-barred grand-design spiral galaxies are transitional, and that they evolve to barred galaxies on a dynamical timescale. We also confirm that the disk-mass fraction and shear rate are important parameters for the morphology of disk galaxies. The former affects the number of spiral arms and the bar formation epoch, and the latter determines the pitch angle of the spiral arms. △ Less

Submitted 16 March, 2018; v1 submitted 30 November, 2017; originally announced December 2017.

Comments: 23 pages; 29 figures. Accepted by MNRAS

arXiv:1711.03558 [pdf, other]

doi 10.1093/mnrasl/sly088

The origin of interstellar asteroidal objects like 1I/2017 U1 'Oumuamua

Authors: Simon Portegies Zwart, Santiago Torres, Inti Pelupessy, Jeroen Bedorf, Maxwell Cai

Abstract: We study the origin of the interstellar object 1I/2017 U1 'Oumuamua by juxtaposing estimates based on the observations with simulations. We speculate that objects like 'Oumuamua are formed in the debris disc as left over from the star and planet formation process, and subsequently liberated. The liberation process is mediated either by interaction with other stars in the parental star-cluster, by… ▽ More We study the origin of the interstellar object 1I/2017 U1 'Oumuamua by juxtaposing estimates based on the observations with simulations. We speculate that objects like 'Oumuamua are formed in the debris disc as left over from the star and planet formation process, and subsequently liberated. The liberation process is mediated either by interaction with other stars in the parental star-cluster, by resonant interactions within the planetesimal disc or by the relatively sudden mass loss when the host star becomes a compact object. Integrating backward in time in the Galactic potential together with stars from the Gaia-TGAS catalogue we find that about 1.3Myr ago 'Oumuamua passed the nearby star HIP 17288 within a mean distance of $1.3$pc. By comparing nearby observed L-dwarfs with simulations of the Galaxy we conclude that the kinematics of 'Oumuamua is consistent with relatively young objects of $1.1$--$1.7$Gyr. We just met 'Oumuamua by chance, and with a derived mean Galactic density of $\sim 3\times 10^{5}$ similarly sized objects within 100\,au from the Sun or $\sim 10^{14}$ per cubic parsec we expect about 2 to 12 such visitors per year within 1au from the Sun. △ Less

Submitted 11 May, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: MNRAS (in press)

arXiv:1510.04068 [pdf, other]

Sapporo2: A versatile direct $N$-body library

Authors: Jeroen Bédorf, Evghenii Gaburov, Simon Portegies Zwart

Abstract: Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulation… ▽ More Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles ($N < 100$) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL. △ Less

Submitted 14 October, 2015; originally announced October 2015.

Comments: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmology

arXiv:1412.0659 [pdf, other]

doi 10.1109/SC.2014.10

24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs

Authors: Jeroen Bédorf, Evghenii Gaburov, Michiko S. Fujii, Keigo Nitadori, Tomoaki Ishiyama, Simon Portegies Zwart

Abstract: We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This i… ▽ More We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively. △ Less

Submitted 1 December, 2014; originally announced December 2014.

Comments: 12 pages, 4 figures, Published in: 'Proceeding SC '14 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis'. Gordon Bell Prize 2014 finalist

arXiv:1409.5474 [pdf, other]

Computational Gravitational Dynamics with Modern Numerical Accelerators

Authors: Simon Portegies Zwart, Jeroen Bédorf

Abstract: We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the performance characteristics diff… ▽ More We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the performance characteristics differ, hybridizing the two algorithms is essential when simulating a large $N$-body system with high-density structures containing few particles, and with low-density structures containing many particles. We demonstrate how this can be realized by splitting the underlying Hamiltonian, and we subsequently demonstrate the efficiency and accuracy of the hybrid code by simulating a group of 11 merging galaxies with massive black holes in the nuclei. △ Less

Submitted 18 September, 2014; originally announced September 2014.

Comments: Accepted for publication in IEEE Computer

arXiv:1301.6784 [pdf, ps, other]

doi 10.1093/mnras/stt208

The Effect of Many Minor Mergers on the Size Growth of Compact Quiescent Galaxies

Authors: Jeroen Bédorf, Simon Portegies Zwart

Abstract: Massive galaxies with a half-mass radius <~ 1kpc are observed in the early universe (z~>2), but not in the local universe. In the local universe similar-mass (within a factor of two) galaxies tend to be a factor of 4 to 5 larger. Dry minor mergers are known to drive the evolution of the size of a galaxy without much increasing the mass, but it is unclear if the growth in size is sufficient to expl… ▽ More Massive galaxies with a half-mass radius <~ 1kpc are observed in the early universe (z~>2), but not in the local universe. In the local universe similar-mass (within a factor of two) galaxies tend to be a factor of 4 to 5 larger. Dry minor mergers are known to drive the evolution of the size of a galaxy without much increasing the mass, but it is unclear if the growth in size is sufficient to explain the observations. We test the hypothesis that galaxies grow through dry minor mergers by simulating merging galaxies with mass ratios of q=1:1 (equal mass) to q=1:160. In our N-body simulations the total mass of the parent galaxy doubles. We confirm that major mergers do not cause a sufficient growth in size. The observation can be explained with mergers with a mass ratio of q=1:5--1:10. Smaller mass ratios cause a more dramatic growth in size, up to a factor of ~17 for mergers with a mass ratio of 1:80. For relatively massive minor mergers q ~> 1:20 the mass of the incoming child galaxies tend to settle in the halo of the parent galaxy. This is caused by the tidal stripping of the child galaxies by the time they enter the central portion of the parent. When the accretion of minor galaxies becomes more continuous, when q <~ 1:40, the foreign mass tends to concentrate more in the central region of the parent galaxy. We speculate that this is caused by dynamic interactions between the child galaxies inside the merger remnant and the longer merging times when the difference in mass is larger. These interactions cause dynamical heating which results in accretion of mass inside the galaxy core and a reduction of the parent's circular velocity and density. △ Less

Submitted 28 January, 2013; originally announced January 2013.

Comments: 16 pages, 19 figures, accepted by MNRAS

arXiv:1204.3106 [pdf, ps, other]

doi 10.1140/epjst/e2012-1647-6

A pilgrimage to gravity on GPUs

Authors: Jeroen Bédorf, Simon Portegies Zwart

Abstract: In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations… ▽ More In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics. △ Less

Submitted 13 April, 2012; originally announced April 2012.

Comments: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figures

arXiv:1204.2280 [pdf, ps, other]

Bonsai: A GPU Tree-Code

Authors: Jeroen Bédorf, Evghenii Gaburov, Simon Portegies Zwart

Abstract: We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an ove… ▽ More We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an overall performance improvement of more than a factor 20, resulting in a processing rate of more than 2.8 million particles per second. △ Less

Submitted 10 April, 2012; originally announced April 2012.

Comments: 5 pages, 2 figures. Proceedings of "Advances in Computational Astrophysics: methods, tools and outcomes", June 13-17, 2011, Cefalu, Sicily, Italy, eds. Capuzzo Dolcetta, Limongi, Tornambe and Giobbi

arXiv:1106.1900 [pdf, other]

doi 10.1016/j.jcp.2011.12.024

A sparse octree gravitational N-body code that runs entirely on the GPU processor

Authors: Jeroen Bédorf, Evghenii Gaburov, Simon Portegies Zwart

Abstract: We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html… ▽ More We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. △ Less

Submitted 10 April, 2012; v1 submitted 9 June, 2011; originally announced June 2011.

Comments: Accepted version. Published in Journal of Computational Physics. 35 pages, 12 figures, single column

Journal ref: Journal of Computational Physics. Volume 231, Issue 7, 1 April 2012, Pages 2825-2839

arXiv:1005.5384 [pdf, ps, other]

doi 10.1016/j.procs.2010.04.124

Gravitational tree-code on graphics processing units: implementation in CUDA

Authors: Evghenii Gaburov, Jeroen Bédorf, Simon Portegies Zwart

Abstract: We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about… ▽ More We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of $θ\approx 0.5$. The code has a convenient user interface and is freely available for use\footnote{\tt http://castle.strw.leidenuniv.nl/software/octgrav.html}. △ Less

Submitted 28 May, 2010; originally announced May 2010.

Comments: 9 pages, 8 figures. Accepted for publication at International Conference on Computational Science 2010

arXiv:0707.0438 [pdf, ps, other]

doi 10.1016/j.newast.2007.07.004

High Performance Direct Gravitational N-body Simulations on Graphics Processing Units -- II: An implementation in CUDA

Authors: Robert G. Belleman, Jeroen Bedorf, Simon Portegies Zwart

Abstract: We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: t… ▽ More We present the results of gravitational direct $N$-body simulations using the Graphics Processing Unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the $N$-body problem is implemented in ``Compute Unified Device Architecture'' (CUDA) using the GPU to speed-up the calculations. We tested the implementation on three different $N$-body codes: two direct $N$-body integration codes, using the 4th order predictor-corrector Hermite integrator with block time-steps, and one Barnes-Hut treecode, which uses a 2nd order leapfrog integration scheme. The integration of the equations of motions for all codes is performed on the host CPU. We find that for $N > 512$ particles the GPU outperforms the GRAPE-6Af, if some softening in the force calculation is accepted. Without softening and for very small integration time steps the GRAPE still outperforms the GPU. We conclude that modern GPUs offer an attractive alternative to GRAPE-6Af special purpose hardware. Using the same time-step criterion, the total energy of the $N$-body system was conserved better than to one in $10^6$ on the GPU, only about an order of magnitude worse than obtained with GRAPE-6Af. For $N \apgt 10^5$ the 8800GTX outperforms the host CPU by a factor of about 100 and runs at about the same speed as the GRAPE-6Af. △ Less

Submitted 16 July, 2007; v1 submitted 3 July, 2007; originally announced July 2007.

Comments: Accepted for publication in New Astronomy

Journal ref: NewAstron.13:103-112,2008

Showing 1–19 of 19 results for author: Bedorf, J