Parallel $k$d-tree with Batch Updates
Authors:
Ziyang Men,
Zheqi Shen,
Yan Gu,
Yihan Sun
Abstract:
The $k$d-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in $k$d-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and updates.
The goal of this paper is to develop efficient in-memory $k$d-trees by supporting high paralleli…
▽ More
The $k$d-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in $k$d-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and updates.
The goal of this paper is to develop efficient in-memory $k$d-trees by supporting high parallelism and cache-efficiency. We propose the Pkd-tree (Parallel $k$d-tree), a parallel $k$d-tree that is efficient both in theory and in practice. The Pkd-tree supports parallel tree construction, batch update (insertion and deletion), and various queries including k-nearest neighbor search, range query, and range count. We proved that our algorithms have strong theoretical bounds in work (sequential time complexity), span (parallelism), and cache complexity. Our key techniques include 1) an efficient construction algorithm that optimizes work, span, and cache complexity simultaneously, and 2) reconstruction-based update algorithms that guarantee the tree to be weight-balanced. With the new algorithmic insights and careful engineering effort, we achieved a highly optimized implementation of the Pkd-tree.
We tested Pkd-tree with various synthetic and real-world datasets, including both uniform and highly skewed data. We compare the Pkd-tree with state-of-the-art parallel $k$d-tree implementations. In all tests, with better or competitive query performance, Pkd-tree is much faster in construction and updates consistently than all baselines. We released our code.
△ Less
Submitted 6 January, 2025; v1 submitted 14 November, 2024;
originally announced November 2024.
Parallel Longest Increasing Subsequence and van Emde Boas Trees
Authors:
Yan Gu,
Ziyang Men,
Zheqi Shen,
Yihan Sun,
Zijin Wan
Abstract:
This paper studies parallel algorithms for the longest increasing subsequence (LIS) problem. Let $n$ be the input size and $k$ be the LIS length of the input. Sequentially, LIS is a simple problem that can be solved using dynamic programming (DP) in $O(n\log n)$ work. However, parallelizing LIS is a long-standing challenge. We are unaware of any parallel LIS algorithm that has optimal…
▽ More
This paper studies parallel algorithms for the longest increasing subsequence (LIS) problem. Let $n$ be the input size and $k$ be the LIS length of the input. Sequentially, LIS is a simple problem that can be solved using dynamic programming (DP) in $O(n\log n)$ work. However, parallelizing LIS is a long-standing challenge. We are unaware of any parallel LIS algorithm that has optimal $O(n\log n)$ work and non-trivial parallelism (i.e., $\tilde{O}(k)$ or $o(n)$ span).
This paper proposes a parallel LIS algorithm that costs $O(n\log k)$ work, $\tilde{O}(k)$ span, and $O(n)$ space, and is much simpler than the previous parallel LIS algorithms. We also generalize the algorithm to a weighted version of LIS, which maximizes the weighted sum for all objects in an increasing subsequence. To achieve a better work bound for the weighted LIS algorithm, we designed parallel algorithms for the van Emde Boas (vEB) tree, which has the same structure as the sequential vEB tree, and supports work-efficient parallel batch insertion, deletion, and range queries.
We also implemented our parallel LIS algorithms. Our implementation is light-weighted, efficient, and scalable. On input size $10^9$, our LIS algorithm outperforms a highly-optimized sequential algorithm (with $O(n\log k)$ cost) on inputs with $k\le 3\times 10^5$. Our algorithm is also much faster than the best existing parallel implementation by Shen et al. (2022) on all input instances.
△ Less
Submitted 17 April, 2023; v1 submitted 21 August, 2022;
originally announced August 2022.