LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

He, Xinrui; Ban, Yikun; Zou, Jiaru; Wei, Tianxin; Cook, Curtiss B.; He, Jingrui

Computer Science > Machine Learning

arXiv:2410.21520 (cs)

[Submitted on 28 Oct 2024 (v1), last revised 5 Jan 2025 (this version, v3)]

Title:LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

Authors:Xinrui He, Yikun Ban, Jiaru Zou, Tianxin Wei, Curtiss B. Cook, Jingrui He

View PDF HTML (experimental)

Abstract:Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation, making them a promising tool for data imputation. However, challenges persist in designing effective prompts for a finetuning-free process and in mitigating the risk of LLM hallucinations. To address these issues, we propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot learning LLM "trees" with confidence-based weighted voting, inspired by ensemble learning (Random Forest). This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity. Extensive experiments on 9 real-world datasets demonstrate the effectiveness and efficiency of LLM-Forest.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2410.21520 [cs.LG]
	(or arXiv:2410.21520v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.21520

Submission history

From: Xinrui He [view email]
[v1] Mon, 28 Oct 2024 20:42:46 UTC (7,495 KB)
[v2] Mon, 30 Dec 2024 22:37:28 UTC (7,544 KB)
[v3] Sun, 5 Jan 2025 00:33:08 UTC (7,542 KB)

Computer Science > Machine Learning

Title:LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators