Skip to main content

Showing 1–9 of 9 results for author: Otomo, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.16633  [pdf, other

    cs.CL cs.AI

    Graph-Structured Trajectory Extraction from Travelogues

    Authors: Aitaro Yamamoto, Hiroyuki Otomo, Hiroki Ouchi, Shohei Higashiyama, Hiroki Teranishi, Hiroyuki Shindo, Taro Watanabe

    Abstract: Previous studies on sequence-based extraction of human movement trajectories have an issue of inadequate trajectory representation. Specifically, a pair of locations may not be lined up in a sequence especially when one location includes the other geographically. In this study, we propose a graph representation that retains information on the geographic hierarchy as well as the temporal order of v… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  2. arXiv:2308.15152  [pdf, other

    cs.DC

    Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library

    Authors: Hiroyuki Ootomo, Rio Yokota

    Abstract: NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and addition computing unit, where the theoretical peak performance is more than 300 TFlop/s on NVIDIA A100 GPU. NVIDIA provides WMMA API for using Tensor Cores in custom kernel functions. The most common way to use Tensor Core is to supply the input matrices from shared memory, which has higher bandwidth than global memory. Howe… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: HPC Asia 2023

  3. arXiv:2308.15136  [pdf, other

    cs.DS cs.CV cs.DB cs.DC cs.IR

    CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

    Authors: Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, Yong Wang

    Abstract: Approximate Nearest Neighbor Search (ANNS) plays a critical role in various disciplines spanning data mining and artificial intelligence, from information retrieval and computer vision to natural language processing and recommender systems. Data volumes have soared in recent years and the computational cost of an exhaustive exact nearest neighbor search is often prohibitive, necessitating the adop… ▽ More

    Submitted 8 July, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted to ICDE 2024

  4. arXiv:2306.11975  [pdf, other

    cs.DC

    DGEMM on Integer Matrix Multiplication Unit

    Authors: Hiroyuki Ootomo, Katsuhisa Ozaki, Rio Yokota

    Abstract: Deep learning hardware achieves high throughput and low power consumption by reducing computing precision and specializing in matrix multiplication. For machine learning inference, fixed-point value computation is commonplace, where the input and output values and the model parameters are quantized. Thus, many processors are now equipped with fast integer matrix multiplication units (IMMU). It is… ▽ More

    Submitted 30 March, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted by IJHPCA: https://journals.sagepub.com/doi/10.1177/10943420241239588

  5. arXiv:2305.13844  [pdf, other

    cs.CL

    Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation

    Authors: Shohei Higashiyama, Hiroki Ouchi, Hiroki Teranishi, Hiroyuki Otomo, Yusuke Ide, Aitaro Yamamoto, Hiroyuki Shindo, Yuki Matsuda, Shoko Wakamiya, Naoya Inoue, Ikuya Yamada, Taro Watanabe

    Abstract: Geoparsing is a fundamental technique for analyzing geo-entity information in text. We focus on document-level geoparsing, which considers geographic relatedness among geo-entity mentions, and presents a Japanese travelogue dataset designed for evaluating document-level geoparsing systems. Our dataset comprises 200 travelogue documents with rich geo-entity information: 12,171 mentions, 6,339 coref… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  6. arXiv:2304.04612  [pdf, other

    cs.DC

    Mixed-Precision Random Projection for RandNLA on Tensor Cores

    Authors: Hiroyuki Ootomo, Rio Yokota

    Abstract: Random projection can reduce the dimension of data while capturing its structure and is a fundamental tool for machine learning, signal processing, and information retrieval, which deal with a large amount of data today. RandNLA (Randomized Numerical Linear Algebra) leverages random projection to reduce the computational complexity of low-rank decomposition of tensors and solve least-square proble… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: PASC'23

  7. arXiv:2303.08989  [pdf, other

    quant-ph cs.DC

    Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection

    Authors: Hiroyuki Ootomo, Hidetaka Manabe, Kenji Harada, Rio Yokota

    Abstract: Quantum circuit simulation provides the foundation for the development of quantum algorithms and the verification of quantum supremacy. Among the various methods for quantum circuit simulation, tensor network contraction has been increasing in popularity due to its ability to simulate a larger number of qubits. During tensor contraction, the input tensors are reshaped to matrices and computed by a… ▽ More

    Submitted 10 July, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: This paper has been accepted to ISC'23

  8. arXiv:2301.06672  [pdf, other

    cs.DC

    Custom 8-bit floating point value format for reducing shared memory bank conflict in approximate nearest neighbor search

    Authors: Hiroyuki Ootomo, Akira Naruse

    Abstract: The k-nearest neighbor search is used in various applications such as machine learning, computer vision, database search, and information retrieval. While the computational cost of the exact nearest neighbor search is enormous, an approximate nearest neighbor search (ANNS) has been attracting much attention. IVFPQ is one of the ANNS methods. Although we can leverage the high bandwidth and low late… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: Extended "extended abstract of the SC22 research poster"

  9. Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance

    Authors: Hiroyuki Ootomo, Rio Yokota

    Abstract: Tensor Core is a mixed-precision matrix-matrix multiplication unit on NVIDIA GPUs with a theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor Cores were developed in response to the high demand of dense matrix multiplication from machine learning. However, many applications in scientific computing such as preconditioners for iterative solvers and low-precision Four… ▽ More

    Submitted 18 October, 2023; v1 submitted 7 March, 2022; originally announced March 2022.