Search | arXiv e-print repository

arXiv:2505.17381 [pdf]

Programmable Photonic Unitary Processor Enables Parametrized Differentiable Long-Haul Spatial Division Multiplexed Transmission

Authors: Mitsumasa Nakajima, Kohki Shibahara, Kohei Ikeda, Akira Kawai, Masaya Notomi, Yutaka Miyamoto, Toshikazu Hashimoto

Abstract: The explosive growth of global data traffic demands scalable and energy-efficient optical communication systems. Spatial division multiplexing (SDM) using multicore or multimode fibers is a promising solution to overcome the capacity limit of single-mode fibers. However, long-haul SDM transmission faces significant challenges due to modal dispersion, which imposes heavy computational loads on digi… ▽ More The explosive growth of global data traffic demands scalable and energy-efficient optical communication systems. Spatial division multiplexing (SDM) using multicore or multimode fibers is a promising solution to overcome the capacity limit of single-mode fibers. However, long-haul SDM transmission faces significant challenges due to modal dispersion, which imposes heavy computational loads on digital signal processing (DSP) for signal equalization. Here, we propose parameterized SDM transmission, where programmable photonic unitary processors are installed at intermediate nodes. Instead of relying on conventional digital equalization only on the receiver side, our approach enables direct optimization of the SDM transmission channel itself by the programmable unitary processor, which reduces digital post-processing loads. We introduce a gradient-based optimization algorithm using a differentiable SDM transmission model to determine the optimal unitary transformation. As a key enabler, we first implemented telecom-grade programmable photonic unitary processor, achieving a low-loss (2.1 dB fiber-to-fiber), wideband (full C-band), polarization-independent, and high-fidelity (R2>96% across the C-band) operation. We experimentally demonstrate 1300-km transmission using a three-mode fiber, achieving strong agreement between simulation and experiment. The optimized photonic processor significantly reduces modal dispersion and post-processing complexity. Our results establish a scalable framework for integrating photonic computation into the optical layer, enabling more efficient, high-capacity optical networks. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2007.03761 [pdf]

doi 10.1038/s41598-021-93005-1

Compressive dual-comb spectroscopy

Authors: Akira Kawai, Takahiro Kageyama, Ryoichi Horisaki, Takuro Ideguchi

Abstract: Broadband, high resolution and rapid measurement of dual-comb spectroscopy (DCS) generates a large amount of data stream. We numerically demonstrate significant data compression of DCS spectra by using a compressive sensing technique. Our numerical simulation shows a compression rate of more than 100 with 3% error in mole fraction estimation of mid-infrared (MIR) DCS of two molecular species in a… ▽ More Broadband, high resolution and rapid measurement of dual-comb spectroscopy (DCS) generates a large amount of data stream. We numerically demonstrate significant data compression of DCS spectra by using a compressive sensing technique. Our numerical simulation shows a compression rate of more than 100 with 3% error in mole fraction estimation of mid-infrared (MIR) DCS of two molecular species in a broadband (~30 THz) and high resolution (~115 MHz) condition. We also numerically demonstrate a massively parallel MIR DCS spectrum of 10 different molecular species can be reconstructed with a compression rate of 10.5 with a transmittance error of 0.003 from the original spectrum. △ Less

Submitted 14 February, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

arXiv:1912.03857 [pdf]

doi 10.1038/s42005-020-00420-3

Time-stretch infrared spectroscopy

Authors: Akira Kawai, Kazuki Hasihmoto, Venkata Ramaiah Badarla, Takayuki Imamura, Takuro Ideguchi

Abstract: Improving spectral acquisition rate of broadband mid-infrared spectroscopy promises further advancements of molecular science and technology. Unlike the pump-probe spectroscopy that requires repeated measurements with different pump-probe delays, continuous spectroscopy running at a high spectral acquisition rate enables transient measurements of rapidly changing non-repeating phenomena or statist… ▽ More Improving spectral acquisition rate of broadband mid-infrared spectroscopy promises further advancements of molecular science and technology. Unlike the pump-probe spectroscopy that requires repeated measurements with different pump-probe delays, continuous spectroscopy running at a high spectral acquisition rate enables transient measurements of rapidly changing non-repeating phenomena or statistical analysis of a large amount of spectral data acquired within a short time. Recently, Fourier-transform infrared spectrometers (FT-IR) with rapid delay scan mechanisms including dual-comb spectrometers have significantly improved the measurement rate up to ~1 MSpectra/s that is fundamentally limited by the signal-to-noise ratio. Here, we overcome the limit and demonstrate the fastest continuous broadband vibrational spectrometer running at 80 MSpectra/s by implementing wavelength-swept time-stretch spectroscopy technique in the mid-infrared region. Our proof-of-concept experiment of the time-stretch infrared spectroscopy (TS-IR) demonstrates broadband absorption spectroscopy of phenylacetylene from 4.4 to 4.9 μm (2040-2270 cm-1) at a resolution of 15 nm (7.7 cm-1) with a superior signal-to-noise ratio of 85 without averaging and a shot-to-shot fluctuation of 1.3%. △ Less

Submitted 3 June, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

arXiv:1904.02857 [pdf]

doi 10.1038/s41467-019-12442-9

Complementary Vibrational Spectroscopy

Authors: Kazuki Hashimoto, Venkata Ramaiah Badarla, Akira Kawai, Takuro Ideguchi

Abstract: Vibrational spectroscopy, comprised of infrared absorption and Raman scattering spectroscopy, is widely used for label-free optical sensing and imaging in various scientific and industrial fields. The group theory states that the two molecular spectroscopy methods are sensitive to vibrations categorized in different point groups and provide complementary vibrational spectra. Therefore, complete vi… ▽ More Vibrational spectroscopy, comprised of infrared absorption and Raman scattering spectroscopy, is widely used for label-free optical sensing and imaging in various scientific and industrial fields. The group theory states that the two molecular spectroscopy methods are sensitive to vibrations categorized in different point groups and provide complementary vibrational spectra. Therefore, complete vibrational information cannot be acquired by a single spectroscopic device, which has impeded the full potential of vibrational spectroscopy. Here, we demonstrate simultaneous infrared absorption and Raman scattering spectroscopy that allows us to measure the complete broadband vibrational spectra in the molecular fingerprint region with a single instrument based on an ultrashort pulsed laser. The system is based on dual-modal Fourier-transform spectroscopy enabled by efficient use of nonlinear optical effects. Our proof-of-concept experiment demonstrates rapid, broadband and high spectral resolution measurements of complementary spectra of organic liquids for precise and accurate molecular analysis. △ Less

Submitted 4 April, 2019; originally announced April 2019.

arXiv:1602.02832 [pdf, ps, other]

doi 10.1093/pasj/psw018

Hierarchical Tree Algorithm for Collisional N-body Simulations on GRAPE

Authors: Toshiyuki Fukushige, Atsushi Kawai

Abstract: We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because… ▽ More We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because its memory addressing scheme was limited only to sequential access to a full set of particle data. The present GRAPE-9 system has an indirect memory addressing unit and a particle memory large enough to store all particles data and also tree nodes data. The indirect memory addressing unit stores interaction lists for the tree algorithm, which is constructed on host computer, and, according to the interaction lists, force pipelines calculate only the interactions necessary. In our implementation, the interaction calculations are significantly reduced compared to direct $N^2$ summation in the original Hermite scheme. For example, we can archive about a factor 30 of speedup (equivalent to about 17 teraflops) against the Hermite scheme for a simulation of $N=10^6$ system, using hardware of a peak speed of 0.6 teraflops for the Hermite scheme. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Comments: 12 pages, 6 figures, PASJ accepted

arXiv:1410.3252 [pdf, ps, other]

A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals

Authors: Shinji Motoki, Hiroshi Daisaka, Naohito Nakasato, Tadashi Ishikawa, Fukuko Yuasa, Toshiyuki Fukushige, Atsushi Kawai, Junichiro Makino

Abstract: Higher order corrections in perturbative quantum field theory are required for precise theoretical analysis to investigate new physics beyond the Standard Model. This indicates that we need to evaluate Feynman loop diagram with multi-loop integral which may require multi-precision calculation. We developed a dedicated accelerator system for multi-precision calculation (GRAPE9-MPX). We present perf… ▽ More Higher order corrections in perturbative quantum field theory are required for precise theoretical analysis to investigate new physics beyond the Standard Model. This indicates that we need to evaluate Feynman loop diagram with multi-loop integral which may require multi-precision calculation. We developed a dedicated accelerator system for multi-precision calculation (GRAPE9-MPX). We present performance results of our system for the case of Feynman two-loop box and three-loop selfenergy diagrams with multi-precision. △ Less

Submitted 30 November, 2014; v1 submitted 13 October, 2014; originally announced October 2014.

Comments: 6 pages, 8 figures, 1 table; submitted to the proceedings of the 16th International workshop on Advanced Computing and Analysis Techniques in physics research (ACAT 2014), Prague

arXiv:1101.4933 [pdf, ps, other]

doi 10.1088/0004-637X/729/1/29

Spatially Resolved Spectroscopic Observations of a Possible E+A Progenitor SDSS J160241.00+521426.9

Authors: Kazuya Matsubayashi, Masafumi Yagi, Tomotsugu Goto, Akira Akita, Hajime Sugai, Atsushi Kawai, Atsushi Shimono, Takashi Hattori

Abstract: In order to investigate the evolution of E+A galaxies, we observed a galaxy SDSS J160241.00+521426.9, a possible E+A progenitor which shows both emission and strong Balmer absorptions, and its neighbor galaxy. We used the integral field spectroscopic mode of the Kyoto Tridimensional Spectrograph (Kyoto3DII), mounted on the University of Hawaii 88-inch telescope located on Mauna Kea, and the slit-s… ▽ More In order to investigate the evolution of E+A galaxies, we observed a galaxy SDSS J160241.00+521426.9, a possible E+A progenitor which shows both emission and strong Balmer absorptions, and its neighbor galaxy. We used the integral field spectroscopic mode of the Kyoto Tridimensional Spectrograph (Kyoto3DII), mounted on the University of Hawaii 88-inch telescope located on Mauna Kea, and the slit-spectroscopic mode of the Faint Object Camera and Spectrograph (FOCAS) on the Subaru Telescope. We found a strong Balmer absorption region in the center of the galaxy and an emission-line region located 2 kpc from the center, in the direction of its neighbor galaxy. The recession velocities of the galaxy and its neighbor galaxy differ only by 100 km s^-1, which suggests that they are a physical pair and would have been interacting. Comparing observed Lick indices of Balmer lines and color indices with those predicted from stellar population synthesis models, we find that a suddenly quenched star-formation scenario is plausible for the star-formation history of the central region. We consider that star formation started in the galaxy due to galaxy interactions and was quenched in the central region, whereas star formation in a region offset from the center still continues or has begun recently. This work is the first study of a possible E+A progenitor using spatially resolved spectroscopy. △ Less

Submitted 25 January, 2011; originally announced January 2011.

Comments: 39 pages, 10 figures and 10 tables, accepted to ApJ

arXiv:0907.2012 [pdf, ps, other]

doi 10.1088/0004-637X/701/2/1636

Galactic Wind in the Nearby Starburst Galaxy NGC 253 Observed with the Kyoto3DII Fabry-Perot Mode

Authors: K. Matsubayashi, H. Sugai, T. Hattori, A. Kawai, S. Ozaki, G. Kosugi, T. Ishigaki, A. Shimono

Abstract: We have observed the central region of the nearby starburst galaxy NGC 253 with the Kyoto Tridimensional Spectrograph II (Kyoto3DII) Fabry-Perot mode in order to investigate the properties of its galactic wind. Since this galaxy has a large inclination, it is easy to observe its galactic wind. We produced the Ha, [N II]6583, and [S II]6716,6731 images, as well as those line ratio maps. The [N II… ▽ More We have observed the central region of the nearby starburst galaxy NGC 253 with the Kyoto Tridimensional Spectrograph II (Kyoto3DII) Fabry-Perot mode in order to investigate the properties of its galactic wind. Since this galaxy has a large inclination, it is easy to observe its galactic wind. We produced the Ha, [N II]6583, and [S II]6716,6731 images, as well as those line ratio maps. The [N II]/Ha ratio in the galactic wind region is larger than those in H II regions in the galactic disk. The [N II]/Ha ratio in the southeastern filament, a part of the galactic wind, is the largest and reaches about 1.5. These large [N II]/Ha ratios are explained by shock ionization/excitation. Using the [S II]/Ha ratio map, we spatially separate the galactic wind region from the starburst region. The kinetic energy of the galactic wind can be sufficiently supplied by supernovae in a starburst region in the galactic center. The shape of the galactic wind and the line ratio maps are non-axisymmetric about the galactic minor axis, which is also seen in M82. In the [N II]6583/[S II]6716,6731 map, the positions with large ratios coincide with the positions of star clusters found in the Hubble Space Telescope (HST) observation. This means that intense star formation causes strong nitrogen enrichment in these regions. Our unique data of the line ratio maps including [S II] lines have demonstrated their effectiveness for clearly distinguishing between shocked gas regions and starburst regions, determining the extent of galactic wind and its mass and kinetic energy, and discovering regions with enhanced nitrogen abundance. △ Less

Submitted 12 July, 2009; originally announced July 2009.

Comments: 22 pages, 5 figures, 1 table, accepted for publication in ApJ

Journal ref: Astrophys.J.701:1636-1643,2009

arXiv:0801.1109 [pdf, ps, other]

doi 10.1111/j.1365-2966.2008.12916.x

Integrated field spectroscopy of E+A (post-starburst) galaxies with the Kyoto3DII

Authors: Tomotsugu Goto, Atsushi Kawai, Atsushi Shimono, Hajime Sugai, Masafumi Yagi, Takashi Hattori

Abstract: We have performed a two-dimensional spectroscopy of three nearby E+A (post-starburst) galaxies with the Kyoto3DII integral field spectrograph. In all the cases, Hdelta absorption is stronger at the centre of the galaxies, but significantly extended in a few kpc scale. For one galaxy (J1656), we found a close companion galaxy at the same redshift. The galaxy turned out to be a star-forming galaxy… ▽ More We have performed a two-dimensional spectroscopy of three nearby E+A (post-starburst) galaxies with the Kyoto3DII integral field spectrograph. In all the cases, Hdelta absorption is stronger at the centre of the galaxies, but significantly extended in a few kpc scale. For one galaxy (J1656), we found a close companion galaxy at the same redshift. The galaxy turned out to be a star-forming galaxy with a strong emission in Hgamma. For the other two galaxies, we have found that the central post-starburst regions possibly extend toward the direction of the tidal tails. Our results are consistent with the merger/interaction origin of E+A galaxies, where the infalling-gas possibly caused by a galaxy-galaxy merging creates a central-starburst, succeeded by a post-starburst (E+A) phase once the gas is depleted. △ Less

Submitted 7 January, 2008; originally announced January 2008.

Comments: Accepted for publication in MNRAS

arXiv:astro-ph/0702392 [pdf, ps, other]

doi 10.1086/513731

Integral Field Spectroscopy of the Quadruply Lensed Quasar 1RXS J1131-1231: New Light on Lens Substructures

Authors: H. Sugai, A. Kawai, A. Shimono, T. Hattori, G. Kosugi, N. Kashikawa, K. T. Inoue, M. Chiba

Abstract: We have observed the quadruply lensed quasar 1RXS J1131-1231 with the integral field spectrograph mode of the Kyoto Tridimensional Spectrograph II mounted on the Subaru telescope. Its field of view has covered simultaneously the three brighter lensed images A, B, and C, which are known to exhibit anomalous flux ratios in their continuum emission. We have found that the [OIII] line flux ratios am… ▽ More We have observed the quadruply lensed quasar 1RXS J1131-1231 with the integral field spectrograph mode of the Kyoto Tridimensional Spectrograph II mounted on the Subaru telescope. Its field of view has covered simultaneously the three brighter lensed images A, B, and C, which are known to exhibit anomalous flux ratios in their continuum emission. We have found that the [OIII] line flux ratios among these lensed images are consistent with those predicted by smooth-lens models. The absence of both microlensing and millilensing effects on this [OIII] narrow line region sets important limits on the mass of any substructures along the line of sight, which is expressed as M_E < 10^5 M_solar for the mass inside an Einstein radius. In contrast, the H_beta line emission, which originates from the broad line region, shows an anomaly in the flux ratio between images B and C, i.e., a factor two smaller C/B ratio than predicted by smooth-lens models. The ratio of A/B in the H_beta line is well reproduced. We show that the anomalous C/B ratio for the H_beta line is caused most likely by micro/milli-lensing of image C. This is because other effects, such as the differential dust extinction and/or arrival time difference between images B and C, or the simultaneous lensing of another pair of images A and B, are all unlikely. In addition, we have found that the broad H_beta line of image A shows a slight asymmetry in its profile compared with those in the other images, which suggests the presence of a small microlensing effect on this line emitting region of image A. △ Less

Submitted 14 February, 2007; originally announced February 2007.

Comments: 21 pages, 7 figures, 1 table, ApJ accepted

Journal ref: Astrophys.J.660:1016-1022,2007

arXiv:astro-ph/0504407 [pdf, ps, other]

doi 10.1093/pasj/57.6.1009

GRAPE-6A: A single-card GRAPE-6 for parallel PC-GRAPE cluster system

Authors: Toshiyuki Fukushige, Junichiro Makino, Atsushi Kawai

Abstract: In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such configuration is particularly effective in running parallel tree algorithm. Though the use of parallel tree algorithm was possible with the original GRAPE-6 hardware, it was n… ▽ More In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such configuration is particularly effective in running parallel tree algorithm. Though the use of parallel tree algorithm was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days. △ Less

Submitted 19 April, 2005; originally announced April 2005.

Comments: submitted to PASJ

arXiv:astro-ph/0311179 [pdf, ps, other]

doi 10.1086/381060

A Study of the Distribution of Star-Forming Regions in Luminous Infrared Galaxies by Means of H$α$ Imaging Observations

Authors: T. Hattori, M. Yoshida, H. Ohtani, H. Sugai, T. Ishigaki, M. Sasaki, T. Hayashi, S. Ozaki, M. Ishii, A. Kawai

Abstract: We performed H-alpha imaging observations of 22 luminous infrared galaxies to investigate how the distribution of star-forming regions in these galaxies is related to galaxy interactions. Based on correlation diagrams between H-alpha flux and continuum emission for individual galaxies, a sequence for the distribution of star-forming regions was found: very compact (~100 pc) nuclear starbursts wi… ▽ More We performed H-alpha imaging observations of 22 luminous infrared galaxies to investigate how the distribution of star-forming regions in these galaxies is related to galaxy interactions. Based on correlation diagrams between H-alpha flux and continuum emission for individual galaxies, a sequence for the distribution of star-forming regions was found: very compact (~100 pc) nuclear starbursts with almost no star-forming activity in the outer regions (type 1), dominant nuclear starbursts < 1 kpc in size and a negligible contribution from the outer regions (type 2), nuclear starbursts > 1 kpc in size and a significant contribution from the outer regions (type 3), and extended starbursts with relatively faint nuclei (type 4). These classes of star-forming region were found to be strongly related to global star-forming properties such as star-formation efficiency, far-infrared color, and dust extinction. There was a clear tendency for the objects with more compact distributions of star-forming regions to show a higher star-formation efficiency and hotter far-infrared color. An appreciable fraction of the sample objects were dominated by extended starbursts (type 4), which is unexpected in the standard scenario of interaction-induced starburst galaxies. We also found that the distribution of star-forming regions was weakly but clearly related to galaxy morphology: severely disturbed objects had a more concentrated distribution of star-forming regions. This suggests that the properties of galaxy interactions, such as dynamical phase and orbital parameters, play a more important role than the internal properties of progenitor galaxies, such as dynamical structure or gas mass fraction. We also discuss the evolution of the distribution of star-forming regions in interacting galaxies. △ Less

Submitted 7 November, 2003; originally announced November 2003.

Comments: 44 pages, LaTeX, Accepted by AJ, Version with full-resolution figures available at http://www.oao.nao.ac.jp/support/staff/hattori/lirgs_paper.ps.gz

arXiv:astro-ph/0306203 [pdf, ps, other]

doi 10.1086/383192

Structure of Dark Matter Halos From Hierarchical Clustering. III. Shallowing of The Inner Cusp

Authors: Toshiyuki Fukushige, Atsushi Kawai, Junichiro Makino

Abstract: We investigate the structure of the dark matter halo formed in the cold dark matter scenarios by N-body simulations with parallel treecode on GRAPE cluster systems. We simulated 8 halos with the mass of $4.4\times 10^{14}M_{\odot}$ to $1.6\times 10^{15}M_{\odot}$ in the SCDM and LCDM model using up to 30 million particles. With the resolution of our simulations, the density profile is reliable d… ▽ More We investigate the structure of the dark matter halo formed in the cold dark matter scenarios by N-body simulations with parallel treecode on GRAPE cluster systems. We simulated 8 halos with the mass of $4.4\times 10^{14}M_{\odot}$ to $1.6\times 10^{15}M_{\odot}$ in the SCDM and LCDM model using up to 30 million particles. With the resolution of our simulations, the density profile is reliable down to 0.2 percent of the virial radius. Our results show that the slope of inner cusp within 1 percent virial radius is shallower than -1.5, and the radius where the shallowing starts exhibits run-to-run variation, which means the innermost profile is not universal. △ Less

Submitted 10 June, 2003; originally announced June 2003.

Comments: 26 pages, 16 fugures, submitted to ApJ

Journal ref: Astrophys.J. 606 (2004) 625-634

arXiv:astro-ph/0012041 [pdf, ps, other]

doi 10.1086/319638

Pseudoparticle Multipole Method: A Simple Method to Implement High-Accuracy Treecode

Authors: Atsushi Kawai, Junichiro Makino

Abstract: In this letter we describe the pseudoparticle multipole method (P2M2), a new method to express multipole expansion by a distribution of pseudoparticles. We can use this distribution of particles to calculate high order terms in both the Barnes-Hut treecode and FMM. The primary advantage of P2M2 is that it works on GRAPE. GRAPE is a special-purpose hardware for the calculation of gravitational fo… ▽ More In this letter we describe the pseudoparticle multipole method (P2M2), a new method to express multipole expansion by a distribution of pseudoparticles. We can use this distribution of particles to calculate high order terms in both the Barnes-Hut treecode and FMM. The primary advantage of P2M2 is that it works on GRAPE. GRAPE is a special-purpose hardware for the calculation of gravitational force between particles. Although the treecode has been implemented on GRAPE, we could handle terms only up to dipole, since GRAPE can calculate forces from point-mass particles only. Thus the calculation cost grows quickly when high accuracy is required. With P2M2, the multipole expansion is expressed by particles, and thus GRAPE can calculate high order terms. Using P2M2, we implemented an arbitrary-order treecode on GRAPE-4. Timing result shows GRAPE-4 accelerates the calculation by a factor between 10 (for low accuracy) to 150 (for high accuracy). Even on general-purpose programmable computers, our method offers the advantage that the mathematical formulae and therefore the actual program is much simpler than that of the direct implementation of multipole expansion. △ Less

Submitted 2 December, 2000; originally announced December 2000.

Comments: 6 pages, 4 figures, latex, submitted to ApJ Letters

arXiv:astro-ph/9909116 [pdf, ps, other]

doi 10.1093/pasj/52.4.659

GRAPE-5: A Special-Purpose Computer for N-body Simulation

Authors: Atsushi Kawai, Toshiyuki Fukushige, Junichiro Makino, Makoto Taiji

Abstract: We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and t… ▽ More We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in the communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep of 128k-body simulation with direct summation algorithm takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep of 10^6-body simulation can be done in 16 seconds. △ Less

Submitted 7 September, 1999; originally announced September 1999.

Comments: 19 pages, 24 Postscript figures, 3 tables, Latex, submitted to Publications of the Astronomical Society of Japan

arXiv:astro-ph/9906419 [pdf, ps, other]

doi 10.1093/pasj/52.5.943

PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations

Authors: Tsuyoshi Hamada, Toshiyuki Fukushige, Atsushi Kawai, Junichiro Makino

Abstract: We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter rely on the hardwired pipeline processor specialized to gravitational interactions. Since the log… ▽ More We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter rely on the hardwired pipeline processor specialized to gravitational interactions. Since the logic implemented in FPGA chips can be reconfigured, we can use PROGRAPE-1 to calculate not only gravitational interactions but also other forms of interactions such as van der Waals force, hydrodynamical interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To evaluate the programmability and performance of PROGRAPE-1, we implemented a pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for gravitational interaction, PROGRAPE-1 provided the speed of 0.96 Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of particle-based simulations in which the calculation cost of interactions other than gravity is high, such as the evaluation of SPH interactions. △ Less

Submitted 8 July, 1999; v1 submitted 25 June, 1999; originally announced June 1999.

Comments: 20 pages with 9 figures; submitted to PASJ

arXiv:astro-ph/9905101 [pdf, ps, other]

7.0/Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5

Authors: Atsushi Kawai, Toshiyuki Fukushige, Junichiro Makino

Abstract: As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer for astrophysical N-body simulations. The GRAPE-5 system has 32 pipeline processors specialized for the gravitational force calculation. Other operations, such as tree construction, tree traverse and ti… ▽ More As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer for astrophysical N-body simulations. The GRAPE-5 system has 32 pipeline processors specialized for the gravitational force calculation. Other operations, such as tree construction, tree traverse and time integration, are performed on a general purpose workstation. The total cost for the GRAPE-5 system is 40,900 dollars. We performed a cosmological N-body simulation with 2.1 million particles, which sustained a performance of 5.92 Gflops averaged over 8.37 hours. The price per performance obtained is 7.0 dollars per Mflops. △ Less

Submitted 24 November, 1999; v1 submitted 8 May, 1999; originally announced May 1999.

Comments: 7 pages, 4 figures, 1999 Gordon Bell Prize Winner (price/performance category) Performance figures are improved. Two photographs are added

arXiv:astro-ph/9812431 [pdf, ps, other]

A Simple Formulation of the Fast Multipole Method: Pseudo-Particle Multipole Method

Authors: Atsushi Kawai, Junichiro Makino

Abstract: We present the pseudo-particle multipole method (P2M2), a new method to handle multipole expansion in fast multipole method and treecode. This method uses a small number of pseudo-particles to express multipole expansion. With this method, the implementation of FMM and treecode with high-order multipole terms is greatly simplified. We applied P2M2 to treecode and combined it with special-purpose… ▽ More We present the pseudo-particle multipole method (P2M2), a new method to handle multipole expansion in fast multipole method and treecode. This method uses a small number of pseudo-particles to express multipole expansion. With this method, the implementation of FMM and treecode with high-order multipole terms is greatly simplified. We applied P2M2 to treecode and combined it with special-purpose computer GRAPE. Extensive tests on the accuracy and calculation cost demonstrate that the new method is quite attractive. △ Less

Submitted 23 December, 1998; originally announced December 1998.

Comments: 13 pages, 6 figures, uses aaspp4.sty, to appear in the proceedings of the ninth SIAM conference on parallel processing for scientific computing, Texas, March 1999

arXiv:astro-ph/9707079 [pdf, ps, other]

doi 10.1093/pasj/49.5.607

The PCI Interface for GRAPE Systems: PCI-HIB

Authors: A. Kawai, T. Fukushige, M. Taiji, J. Makino, D. Sugimoto

Abstract: We developed a PCI interface for GRAPE systems. GRAPE(GRAvity piPE) is a special-purpose computer for gravitational N-body simulations. A GRAPE system consists of GRAPE processor boards and a host computer. GRAPE processors perform the calculation of gravitational forces between particles. The host computer performs the rest of calculations. The newest of GRAPE machines, the GRAPE-4, achieved th… ▽ More We developed a PCI interface for GRAPE systems. GRAPE(GRAvity piPE) is a special-purpose computer for gravitational N-body simulations. A GRAPE system consists of GRAPE processor boards and a host computer. GRAPE processors perform the calculation of gravitational forces between particles. The host computer performs the rest of calculations. The newest of GRAPE machines, the GRAPE-4, achieved the peak performance of 1.08 Tflops. The GRAPE-4 system uses TURBOChannel for the interface to the host, which limits the selection of the host computer. The TURBOChannel bus is not supported by any of recent workstations. We developed a new host interface board which adopts the PCI bus instead of the TURBOChannel. PCI is an I/O bus standard developed by Intel. It has fairly high peak transfer speed, and is available on wide range of computers, from PCs to supercomputers. Thus, the new interface allows us to connect GRAPE-4 to a wide variety of host computers. In test runs with a Barnes-Hut treecode, we found that the performance of new system with PCI interface is 40% better than that of the original system. △ Less

Submitted 16 July, 1997; v1 submitted 7 July, 1997; originally announced July 1997.

Comments: 15 pages, 10 Postscript figures, 3 tables, Latex, submitted to Publications of the Astronomical Society of Japan. corrected figure 2 which contained non standard fonts

Showing 1–19 of 19 results for author: Kawai, A