-
Programmable Photonic Unitary Processor Enables Parametrized Differentiable Long-Haul Spatial Division Multiplexed Transmission
Authors:
Mitsumasa Nakajima,
Kohki Shibahara,
Kohei Ikeda,
Akira Kawai,
Masaya Notomi,
Yutaka Miyamoto,
Toshikazu Hashimoto
Abstract:
The explosive growth of global data traffic demands scalable and energy-efficient optical communication systems. Spatial division multiplexing (SDM) using multicore or multimode fibers is a promising solution to overcome the capacity limit of single-mode fibers. However, long-haul SDM transmission faces significant challenges due to modal dispersion, which imposes heavy computational loads on digi…
▽ More
The explosive growth of global data traffic demands scalable and energy-efficient optical communication systems. Spatial division multiplexing (SDM) using multicore or multimode fibers is a promising solution to overcome the capacity limit of single-mode fibers. However, long-haul SDM transmission faces significant challenges due to modal dispersion, which imposes heavy computational loads on digital signal processing (DSP) for signal equalization. Here, we propose parameterized SDM transmission, where programmable photonic unitary processors are installed at intermediate nodes. Instead of relying on conventional digital equalization only on the receiver side, our approach enables direct optimization of the SDM transmission channel itself by the programmable unitary processor, which reduces digital post-processing loads. We introduce a gradient-based optimization algorithm using a differentiable SDM transmission model to determine the optimal unitary transformation. As a key enabler, we first implemented telecom-grade programmable photonic unitary processor, achieving a low-loss (2.1 dB fiber-to-fiber), wideband (full C-band), polarization-independent, and high-fidelity (R2>96% across the C-band) operation. We experimentally demonstrate 1300-km transmission using a three-mode fiber, achieving strong agreement between simulation and experiment. The optimized photonic processor significantly reduces modal dispersion and post-processing complexity. Our results establish a scalable framework for integrating photonic computation into the optical layer, enabling more efficient, high-capacity optical networks.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Compressive dual-comb spectroscopy
Authors:
Akira Kawai,
Takahiro Kageyama,
Ryoichi Horisaki,
Takuro Ideguchi
Abstract:
Broadband, high resolution and rapid measurement of dual-comb spectroscopy (DCS) generates a large amount of data stream. We numerically demonstrate significant data compression of DCS spectra by using a compressive sensing technique. Our numerical simulation shows a compression rate of more than 100 with 3% error in mole fraction estimation of mid-infrared (MIR) DCS of two molecular species in a…
▽ More
Broadband, high resolution and rapid measurement of dual-comb spectroscopy (DCS) generates a large amount of data stream. We numerically demonstrate significant data compression of DCS spectra by using a compressive sensing technique. Our numerical simulation shows a compression rate of more than 100 with 3% error in mole fraction estimation of mid-infrared (MIR) DCS of two molecular species in a broadband (~30 THz) and high resolution (~115 MHz) condition. We also numerically demonstrate a massively parallel MIR DCS spectrum of 10 different molecular species can be reconstructed with a compression rate of 10.5 with a transmittance error of 0.003 from the original spectrum.
△ Less
Submitted 14 February, 2021; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Time-stretch infrared spectroscopy
Authors:
Akira Kawai,
Kazuki Hasihmoto,
Venkata Ramaiah Badarla,
Takayuki Imamura,
Takuro Ideguchi
Abstract:
Improving spectral acquisition rate of broadband mid-infrared spectroscopy promises further advancements of molecular science and technology. Unlike the pump-probe spectroscopy that requires repeated measurements with different pump-probe delays, continuous spectroscopy running at a high spectral acquisition rate enables transient measurements of rapidly changing non-repeating phenomena or statist…
▽ More
Improving spectral acquisition rate of broadband mid-infrared spectroscopy promises further advancements of molecular science and technology. Unlike the pump-probe spectroscopy that requires repeated measurements with different pump-probe delays, continuous spectroscopy running at a high spectral acquisition rate enables transient measurements of rapidly changing non-repeating phenomena or statistical analysis of a large amount of spectral data acquired within a short time. Recently, Fourier-transform infrared spectrometers (FT-IR) with rapid delay scan mechanisms including dual-comb spectrometers have significantly improved the measurement rate up to ~1 MSpectra/s that is fundamentally limited by the signal-to-noise ratio. Here, we overcome the limit and demonstrate the fastest continuous broadband vibrational spectrometer running at 80 MSpectra/s by implementing wavelength-swept time-stretch spectroscopy technique in the mid-infrared region. Our proof-of-concept experiment of the time-stretch infrared spectroscopy (TS-IR) demonstrates broadband absorption spectroscopy of phenylacetylene from 4.4 to 4.9 μm (2040-2270 cm-1) at a resolution of 15 nm (7.7 cm-1) with a superior signal-to-noise ratio of 85 without averaging and a shot-to-shot fluctuation of 1.3%.
△ Less
Submitted 3 June, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Complementary Vibrational Spectroscopy
Authors:
Kazuki Hashimoto,
Venkata Ramaiah Badarla,
Akira Kawai,
Takuro Ideguchi
Abstract:
Vibrational spectroscopy, comprised of infrared absorption and Raman scattering spectroscopy, is widely used for label-free optical sensing and imaging in various scientific and industrial fields. The group theory states that the two molecular spectroscopy methods are sensitive to vibrations categorized in different point groups and provide complementary vibrational spectra. Therefore, complete vi…
▽ More
Vibrational spectroscopy, comprised of infrared absorption and Raman scattering spectroscopy, is widely used for label-free optical sensing and imaging in various scientific and industrial fields. The group theory states that the two molecular spectroscopy methods are sensitive to vibrations categorized in different point groups and provide complementary vibrational spectra. Therefore, complete vibrational information cannot be acquired by a single spectroscopic device, which has impeded the full potential of vibrational spectroscopy. Here, we demonstrate simultaneous infrared absorption and Raman scattering spectroscopy that allows us to measure the complete broadband vibrational spectra in the molecular fingerprint region with a single instrument based on an ultrashort pulsed laser. The system is based on dual-modal Fourier-transform spectroscopy enabled by efficient use of nonlinear optical effects. Our proof-of-concept experiment demonstrates rapid, broadband and high spectral resolution measurements of complementary spectra of organic liquids for precise and accurate molecular analysis.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Hierarchical Tree Algorithm for Collisional N-body Simulations on GRAPE
Authors:
Toshiyuki Fukushige,
Atsushi Kawai
Abstract:
We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because…
▽ More
We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because its memory addressing scheme was limited only to sequential access to a full set of particle data. The present GRAPE-9 system has an indirect memory addressing unit and a particle memory large enough to store all particles data and also tree nodes data. The indirect memory addressing unit stores interaction lists for the tree algorithm, which is constructed on host computer, and, according to the interaction lists, force pipelines calculate only the interactions necessary. In our implementation, the interaction calculations are significantly reduced compared to direct $N^2$ summation in the original Hermite scheme. For example, we can archive about a factor 30 of speedup (equivalent to about 17 teraflops) against the Hermite scheme for a simulation of $N=10^6$ system, using hardware of a peak speed of 0.6 teraflops for the Hermite scheme.
△ Less
Submitted 8 February, 2016;
originally announced February 2016.
-
A development of an accelerator board dedicated for multi-precision arithmetic operations and its application to Feynman loop integrals
Authors:
Shinji Motoki,
Hiroshi Daisaka,
Naohito Nakasato,
Tadashi Ishikawa,
Fukuko Yuasa,
Toshiyuki Fukushige,
Atsushi Kawai,
Junichiro Makino
Abstract:
Higher order corrections in perturbative quantum field theory are required for precise theoretical analysis to investigate new physics beyond the Standard Model. This indicates that we need to evaluate Feynman loop diagram with multi-loop integral which may require multi-precision calculation. We developed a dedicated accelerator system for multi-precision calculation (GRAPE9-MPX). We present perf…
▽ More
Higher order corrections in perturbative quantum field theory are required for precise theoretical analysis to investigate new physics beyond the Standard Model. This indicates that we need to evaluate Feynman loop diagram with multi-loop integral which may require multi-precision calculation. We developed a dedicated accelerator system for multi-precision calculation (GRAPE9-MPX). We present performance results of our system for the case of Feynman two-loop box and three-loop selfenergy diagrams with multi-precision.
△ Less
Submitted 30 November, 2014; v1 submitted 13 October, 2014;
originally announced October 2014.
-
Spatially Resolved Spectroscopic Observations of a Possible E+A Progenitor SDSS J160241.00+521426.9
Authors:
Kazuya Matsubayashi,
Masafumi Yagi,
Tomotsugu Goto,
Akira Akita,
Hajime Sugai,
Atsushi Kawai,
Atsushi Shimono,
Takashi Hattori
Abstract:
In order to investigate the evolution of E+A galaxies, we observed a galaxy SDSS J160241.00+521426.9, a possible E+A progenitor which shows both emission and strong Balmer absorptions, and its neighbor galaxy. We used the integral field spectroscopic mode of the Kyoto Tridimensional Spectrograph (Kyoto3DII), mounted on the University of Hawaii 88-inch telescope located on Mauna Kea, and the slit-s…
▽ More
In order to investigate the evolution of E+A galaxies, we observed a galaxy SDSS J160241.00+521426.9, a possible E+A progenitor which shows both emission and strong Balmer absorptions, and its neighbor galaxy. We used the integral field spectroscopic mode of the Kyoto Tridimensional Spectrograph (Kyoto3DII), mounted on the University of Hawaii 88-inch telescope located on Mauna Kea, and the slit-spectroscopic mode of the Faint Object Camera and Spectrograph (FOCAS) on the Subaru Telescope. We found a strong Balmer absorption region in the center of the galaxy and an emission-line region located 2 kpc from the center, in the direction of its neighbor galaxy. The recession velocities of the galaxy and its neighbor galaxy differ only by 100 km s^-1, which suggests that they are a physical pair and would have been interacting. Comparing observed Lick indices of Balmer lines and color indices with those predicted from stellar population synthesis models, we find that a suddenly quenched star-formation scenario is plausible for the star-formation history of the central region. We consider that star formation started in the galaxy due to galaxy interactions and was quenched in the central region, whereas star formation in a region offset from the center still continues or has begun recently. This work is the first study of a possible E+A progenitor using spatially resolved spectroscopy.
△ Less
Submitted 25 January, 2011;
originally announced January 2011.
-
Galactic Wind in the Nearby Starburst Galaxy NGC 253 Observed with the Kyoto3DII Fabry-Perot Mode
Authors:
K. Matsubayashi,
H. Sugai,
T. Hattori,
A. Kawai,
S. Ozaki,
G. Kosugi,
T. Ishigaki,
A. Shimono
Abstract:
We have observed the central region of the nearby starburst galaxy NGC 253 with the Kyoto Tridimensional Spectrograph II (Kyoto3DII) Fabry-Perot mode in order to investigate the properties of its galactic wind. Since this galaxy has a large inclination, it is easy to observe its galactic wind. We produced the Ha, [N II]6583, and [S II]6716,6731 images, as well as those line ratio maps. The [N II…
▽ More
We have observed the central region of the nearby starburst galaxy NGC 253 with the Kyoto Tridimensional Spectrograph II (Kyoto3DII) Fabry-Perot mode in order to investigate the properties of its galactic wind. Since this galaxy has a large inclination, it is easy to observe its galactic wind. We produced the Ha, [N II]6583, and [S II]6716,6731 images, as well as those line ratio maps. The [N II]/Ha ratio in the galactic wind region is larger than those in H II regions in the galactic disk. The [N II]/Ha ratio in the southeastern filament, a part of the galactic wind, is the largest and reaches about 1.5. These large [N II]/Ha ratios are explained by shock ionization/excitation. Using the [S II]/Ha ratio map, we spatially separate the galactic wind region from the starburst region. The kinetic energy of the galactic wind can be sufficiently supplied by supernovae in a starburst region in the galactic center. The shape of the galactic wind and the line ratio maps are non-axisymmetric about the galactic minor axis, which is also seen in M82. In the [N II]6583/[S II]6716,6731 map, the positions with large ratios coincide with the positions of star clusters found in the Hubble Space Telescope (HST) observation. This means that intense star formation causes strong nitrogen enrichment in these regions. Our unique data of the line ratio maps including [S II] lines have demonstrated their effectiveness for clearly distinguishing between shocked gas regions and starburst regions, determining the extent of galactic wind and its mass and kinetic energy, and discovering regions with enhanced nitrogen abundance.
△ Less
Submitted 12 July, 2009;
originally announced July 2009.
-
Integrated field spectroscopy of E+A (post-starburst) galaxies with the Kyoto3DII
Authors:
Tomotsugu Goto,
Atsushi Kawai,
Atsushi Shimono,
Hajime Sugai,
Masafumi Yagi,
Takashi Hattori
Abstract:
We have performed a two-dimensional spectroscopy of three nearby E+A (post-starburst) galaxies with the Kyoto3DII integral field spectrograph. In all the cases, Hdelta absorption is stronger at the centre of the galaxies, but significantly extended in a few kpc scale. For one galaxy (J1656), we found a close companion galaxy at the same redshift. The galaxy turned out to be a star-forming galaxy…
▽ More
We have performed a two-dimensional spectroscopy of three nearby E+A (post-starburst) galaxies with the Kyoto3DII integral field spectrograph. In all the cases, Hdelta absorption is stronger at the centre of the galaxies, but significantly extended in a few kpc scale. For one galaxy (J1656), we found a close companion galaxy at the same redshift. The galaxy turned out to be a star-forming galaxy with a strong emission in Hgamma. For the other two galaxies, we have found that the central post-starburst regions possibly extend toward the direction of the tidal tails. Our results are consistent with the merger/interaction origin of E+A galaxies, where the infalling-gas possibly caused by a galaxy-galaxy merging creates a central-starburst, succeeded by a post-starburst (E+A) phase once the gas is depleted.
△ Less
Submitted 7 January, 2008;
originally announced January 2008.
-
Integral Field Spectroscopy of the Quadruply Lensed Quasar 1RXS J1131-1231: New Light on Lens Substructures
Authors:
H. Sugai,
A. Kawai,
A. Shimono,
T. Hattori,
G. Kosugi,
N. Kashikawa,
K. T. Inoue,
M. Chiba
Abstract:
We have observed the quadruply lensed quasar 1RXS J1131-1231 with the integral field spectrograph mode of the Kyoto Tridimensional Spectrograph II mounted on the Subaru telescope. Its field of view has covered simultaneously the three brighter lensed images A, B, and C, which are known to exhibit anomalous flux ratios in their continuum emission. We have found that the [OIII] line flux ratios am…
▽ More
We have observed the quadruply lensed quasar 1RXS J1131-1231 with the integral field spectrograph mode of the Kyoto Tridimensional Spectrograph II mounted on the Subaru telescope. Its field of view has covered simultaneously the three brighter lensed images A, B, and C, which are known to exhibit anomalous flux ratios in their continuum emission. We have found that the [OIII] line flux ratios among these lensed images are consistent with those predicted by smooth-lens models. The absence of both microlensing and millilensing effects on this [OIII] narrow line region sets important limits on the mass of any substructures along the line of sight, which is expressed as M_E < 10^5 M_solar for the mass inside an Einstein radius. In contrast, the H_beta line emission, which originates from the broad line region, shows an anomaly in the flux ratio between images B and C, i.e., a factor two smaller C/B ratio than predicted by smooth-lens models. The ratio of A/B in the H_beta line is well reproduced. We show that the anomalous C/B ratio for the H_beta line is caused most likely by micro/milli-lensing of image C. This is because other effects, such as the differential dust extinction and/or arrival time difference between images B and C, or the simultaneous lensing of another pair of images A and B, are all unlikely. In addition, we have found that the broad H_beta line of image A shows a slight asymmetry in its profile compared with those in the other images, which suggests the presence of a small microlensing effect on this line emitting region of image A.
△ Less
Submitted 14 February, 2007;
originally announced February 2007.
-
GRAPE-6A: A single-card GRAPE-6 for parallel PC-GRAPE cluster system
Authors:
Toshiyuki Fukushige,
Junichiro Makino,
Atsushi Kawai
Abstract:
In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such configuration is particularly effective in running parallel tree algorithm. Though the use of parallel tree algorithm was possible with the original GRAPE-6 hardware, it was n…
▽ More
In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such configuration is particularly effective in running parallel tree algorithm. Though the use of parallel tree algorithm was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.
△ Less
Submitted 19 April, 2005;
originally announced April 2005.
-
A Study of the Distribution of Star-Forming Regions in Luminous Infrared Galaxies by Means of H$α$ Imaging Observations
Authors:
T. Hattori,
M. Yoshida,
H. Ohtani,
H. Sugai,
T. Ishigaki,
M. Sasaki,
T. Hayashi,
S. Ozaki,
M. Ishii,
A. Kawai
Abstract:
We performed H-alpha imaging observations of 22 luminous infrared galaxies to investigate how the distribution of star-forming regions in these galaxies is related to galaxy interactions. Based on correlation diagrams between H-alpha flux and continuum emission for individual galaxies, a sequence for the distribution of star-forming regions was found: very compact (~100 pc) nuclear starbursts wi…
▽ More
We performed H-alpha imaging observations of 22 luminous infrared galaxies to investigate how the distribution of star-forming regions in these galaxies is related to galaxy interactions. Based on correlation diagrams between H-alpha flux and continuum emission for individual galaxies, a sequence for the distribution of star-forming regions was found: very compact (~100 pc) nuclear starbursts with almost no star-forming activity in the outer regions (type 1), dominant nuclear starbursts < 1 kpc in size and a negligible contribution from the outer regions (type 2), nuclear starbursts > 1 kpc in size and a significant contribution from the outer regions (type 3), and extended starbursts with relatively faint nuclei (type 4). These classes of star-forming region were found to be strongly related to global star-forming properties such as star-formation efficiency, far-infrared color, and dust extinction. There was a clear tendency for the objects with more compact distributions of star-forming regions to show a higher star-formation efficiency and hotter far-infrared color. An appreciable fraction of the sample objects were dominated by extended starbursts (type 4), which is unexpected in the standard scenario of interaction-induced starburst galaxies. We also found that the distribution of star-forming regions was weakly but clearly related to galaxy morphology: severely disturbed objects had a more concentrated distribution of star-forming regions. This suggests that the properties of galaxy interactions, such as dynamical phase and orbital parameters, play a more important role than the internal properties of progenitor galaxies, such as dynamical structure or gas mass fraction. We also discuss the evolution of the distribution of star-forming regions in interacting galaxies.
△ Less
Submitted 7 November, 2003;
originally announced November 2003.
-
Structure of Dark Matter Halos From Hierarchical Clustering. III. Shallowing of The Inner Cusp
Authors:
Toshiyuki Fukushige,
Atsushi Kawai,
Junichiro Makino
Abstract:
We investigate the structure of the dark matter halo formed in the cold dark matter scenarios by N-body simulations with parallel treecode on GRAPE cluster systems. We simulated 8 halos with the mass of $4.4\times 10^{14}M_{\odot}$ to $1.6\times 10^{15}M_{\odot}$ in the SCDM and LCDM model using up to 30 million particles. With the resolution of our simulations, the density profile is reliable d…
▽ More
We investigate the structure of the dark matter halo formed in the cold dark matter scenarios by N-body simulations with parallel treecode on GRAPE cluster systems. We simulated 8 halos with the mass of $4.4\times 10^{14}M_{\odot}$ to $1.6\times 10^{15}M_{\odot}$ in the SCDM and LCDM model using up to 30 million particles. With the resolution of our simulations, the density profile is reliable down to 0.2 percent of the virial radius. Our results show that the slope of inner cusp within 1 percent virial radius is shallower than -1.5, and the radius where the shallowing starts exhibits run-to-run variation, which means the innermost profile is not universal.
△ Less
Submitted 10 June, 2003;
originally announced June 2003.
-
Pseudoparticle Multipole Method: A Simple Method to Implement High-Accuracy Treecode
Authors:
Atsushi Kawai,
Junichiro Makino
Abstract:
In this letter we describe the pseudoparticle multipole method (P2M2), a new method to express multipole expansion by a distribution of pseudoparticles. We can use this distribution of particles to calculate high order terms in both the Barnes-Hut treecode and FMM. The primary advantage of P2M2 is that it works on GRAPE. GRAPE is a special-purpose hardware for the calculation of gravitational fo…
▽ More
In this letter we describe the pseudoparticle multipole method (P2M2), a new method to express multipole expansion by a distribution of pseudoparticles. We can use this distribution of particles to calculate high order terms in both the Barnes-Hut treecode and FMM. The primary advantage of P2M2 is that it works on GRAPE. GRAPE is a special-purpose hardware for the calculation of gravitational force between particles. Although the treecode has been implemented on GRAPE, we could handle terms only up to dipole, since GRAPE can calculate forces from point-mass particles only. Thus the calculation cost grows quickly when high accuracy is required. With P2M2, the multipole expansion is expressed by particles, and thus GRAPE can calculate high order terms. Using P2M2, we implemented an arbitrary-order treecode on GRAPE-4. Timing result shows GRAPE-4 accelerates the calculation by a factor between 10 (for low accuracy) to 150 (for high accuracy). Even on general-purpose programmable computers, our method offers the advantage that the mathematical formulae and therefore the actual program is much simpler than that of the direct implementation of multipole expansion.
△ Less
Submitted 2 December, 2000;
originally announced December 2000.
-
GRAPE-5: A Special-Purpose Computer for N-body Simulation
Authors:
Atsushi Kawai,
Toshiyuki Fukushige,
Junichiro Makino,
Makoto Taiji
Abstract:
We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and t…
▽ More
We have developed a special-purpose computer for gravitational many-body simulations, GRAPE-5. GRAPE-5 is the successor of GRAPE-3. Both consist of eight custom pipeline chips (G5 chip and GRAPE chip). The difference between GRAPE-5 and GRAPE-3 are: (1) The G5 chip contains two pipelines operating at 80 MHz, while the GRAPE chip had one at 20 MHz. Thus, the calculation speed of the G5 chip and that of GRAPE-5 board are 8 times faster than that of GRAPE chip and GRAPE-3 board. (2) The GRAPE-5 board adopted PCI bus as the interface to the host computer instead of VME of GRAPE-3, resulting in the communication speed one order of magnitude faster. (3) In addition to the pure 1/r potential, the G5 chip can calculate forces with arbitrary cutoff functions, so that it can be applied to Ewald or P^3M methods. (4) The pairwise force calculated on GRAPE-5 is about 10 times more accurate than that on GRAPE-3. On one GRAPE-5 board, one timestep of 128k-body simulation with direct summation algorithm takes 14 seconds. With Barnes-Hut tree algorithm (theta = 0.75), one timestep of 10^6-body simulation can be done in 16 seconds.
△ Less
Submitted 7 September, 1999;
originally announced September 1999.
-
PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations
Authors:
Tsuyoshi Hamada,
Toshiyuki Fukushige,
Atsushi Kawai,
Junichiro Makino
Abstract:
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter rely on the hardwired pipeline processor specialized to gravitational interactions. Since the log…
▽ More
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter rely on the hardwired pipeline processor specialized to gravitational interactions. Since the logic implemented in FPGA chips can be reconfigured, we can use PROGRAPE-1 to calculate not only gravitational interactions but also other forms of interactions such as van der Waals force, hydrodynamical interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To evaluate the programmability and performance of PROGRAPE-1, we implemented a pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for gravitational interaction, PROGRAPE-1 provided the speed of 0.96 Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of particle-based simulations in which the calculation cost of interactions other than gravity is high, such as the evaluation of SPH interactions.
△ Less
Submitted 8 July, 1999; v1 submitted 25 June, 1999;
originally announced June 1999.
-
7.0/Mflops Astrophysical N-Body Simulation with Treecode on GRAPE-5
Authors:
Atsushi Kawai,
Toshiyuki Fukushige,
Junichiro Makino
Abstract:
As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer for astrophysical N-body simulations. The GRAPE-5 system has 32 pipeline processors specialized for the gravitational force calculation. Other operations, such as tree construction, tree traverse and ti…
▽ More
As an entry for the 1999 Gordon Bell price/performance prize, we report an astrophysical N-body simulation performed with a treecode on GRAPE-5 (Gravity Pipe 5) system, a special-purpose computer for astrophysical N-body simulations. The GRAPE-5 system has 32 pipeline processors specialized for the gravitational force calculation. Other operations, such as tree construction, tree traverse and time integration, are performed on a general purpose workstation. The total cost for the GRAPE-5 system is 40,900 dollars. We performed a cosmological N-body simulation with 2.1 million particles, which sustained a performance of 5.92 Gflops averaged over 8.37 hours. The price per performance obtained is 7.0 dollars per Mflops.
△ Less
Submitted 24 November, 1999; v1 submitted 8 May, 1999;
originally announced May 1999.
-
A Simple Formulation of the Fast Multipole Method: Pseudo-Particle Multipole Method
Authors:
Atsushi Kawai,
Junichiro Makino
Abstract:
We present the pseudo-particle multipole method (P2M2), a new method to handle multipole expansion in fast multipole method and treecode. This method uses a small number of pseudo-particles to express multipole expansion. With this method, the implementation of FMM and treecode with high-order multipole terms is greatly simplified. We applied P2M2 to treecode and combined it with special-purpose…
▽ More
We present the pseudo-particle multipole method (P2M2), a new method to handle multipole expansion in fast multipole method and treecode. This method uses a small number of pseudo-particles to express multipole expansion. With this method, the implementation of FMM and treecode with high-order multipole terms is greatly simplified. We applied P2M2 to treecode and combined it with special-purpose computer GRAPE. Extensive tests on the accuracy and calculation cost demonstrate that the new method is quite attractive.
△ Less
Submitted 23 December, 1998;
originally announced December 1998.
-
The PCI Interface for GRAPE Systems: PCI-HIB
Authors:
A. Kawai,
T. Fukushige,
M. Taiji,
J. Makino,
D. Sugimoto
Abstract:
We developed a PCI interface for GRAPE systems. GRAPE(GRAvity piPE) is a special-purpose computer for gravitational N-body simulations. A GRAPE system consists of GRAPE processor boards and a host computer. GRAPE processors perform the calculation of gravitational forces between particles. The host computer performs the rest of calculations. The newest of GRAPE machines, the GRAPE-4, achieved th…
▽ More
We developed a PCI interface for GRAPE systems. GRAPE(GRAvity piPE) is a special-purpose computer for gravitational N-body simulations. A GRAPE system consists of GRAPE processor boards and a host computer. GRAPE processors perform the calculation of gravitational forces between particles. The host computer performs the rest of calculations. The newest of GRAPE machines, the GRAPE-4, achieved the peak performance of 1.08 Tflops. The GRAPE-4 system uses TURBOChannel for the interface to the host, which limits the selection of the host computer. The TURBOChannel bus is not supported by any of recent workstations. We developed a new host interface board which adopts the PCI bus instead of the TURBOChannel. PCI is an I/O bus standard developed by Intel. It has fairly high peak transfer speed, and is available on wide range of computers, from PCs to supercomputers. Thus, the new interface allows us to connect GRAPE-4 to a wide variety of host computers. In test runs with a Barnes-Hut treecode, we found that the performance of new system with PCI interface is 40% better than that of the original system.
△ Less
Submitted 16 July, 1997; v1 submitted 7 July, 1997;
originally announced July 1997.