-
A Parallel Scan Algorithm in the Tensor Core Unit Model
Abstract: We present a parallel scan (prefix sum) algorithm in the Tensor Core Unit (TCU) model of computation. The TCU model assumes that multiplication between two square matrices of constant size $s$ is a basic operation. In the $(s^2, \ell)$-TCU model, we show that for inputs of size $n$, the algorithm has depth at most $2\lfloor \log_s (n)\rfloor$ and runs in… ▽ More
Submitted 26 November, 2024; originally announced November 2024.
Comments: 14 pages, published in 29th International European Conference on Parallel and Distributed Computing (EuroPar 2023)