DILI: A Distribution-Driven Learned Index (Extended version)

Li, Pengfei; Lu, Hua; Zhu, Rong; Ding, Bolin; Yang, Long; Pan, Gang

Computer Science > Databases

arXiv:2304.08817 (cs)

[Submitted on 18 Apr 2023 (v1), last revised 18 May 2023 (this version, v2)]

Title:DILI: A Distribution-Driven Learned Index (Extended version)

Authors:Pengfei Li, Hua Lu, Rong Zhu, Bolin Ding, Long Yang, Gang Pan

View PDF

Abstract:Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided by its child nodes such that a key search enjoys perfect model prediction accuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key search time. Moreover, we design flexible algorithms for DILI to efficiently insert and delete keys and automatically adjust the tree structure when necessary. Extensive experimental results show that DILI outperforms the state-of-the-art alternatives on different kinds of workloads.

Comments:	PVLDB Volume 16
Subjects:	Databases (cs.DB)
Cite as:	arXiv:2304.08817 [cs.DB]
	(or arXiv:2304.08817v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2304.08817

Submission history

From: Pengfei Li [view email]
[v1] Tue, 18 Apr 2023 08:27:24 UTC (669 KB)
[v2] Thu, 18 May 2023 11:53:41 UTC (681 KB)

Computer Science > Databases

Title:DILI: A Distribution-Driven Learned Index (Extended version)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:DILI: A Distribution-Driven Learned Index (Extended version)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators