Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Jayawardhana, Mayuka; Renbo; Dooley, Samuel; Cherepanova, Valeriia; Wilson, Andrew Gordon; Hutter, Frank; White, Colin; Goldstein, Tom; Goldblum, Micah

Computer Science > Computation and Language

arXiv:2502.02672 (cs)

[Submitted on 4 Feb 2025 (v1), last revised 6 Feb 2025 (this version, v2)]

Title:Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Authors:Mayuka Jayawardhana, Renbo, Samuel Dooley, Valeriia Cherepanova, Andrew Gordon Wilson, Frank Hutter, Colin White, Tom Goldstein, Micah Goldblum

View PDF HTML (experimental)

Abstract:Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural language column headers that describe features and labels. Similarly, TabPFN, a recent non-LLM transformer pretrained on numerous tables for in-context learning, has demonstrated excellent performance for dataset sizes up to a thousand samples. In contrast, gradient-boosted decision trees (GBDTs) are typically trained from scratch on each dataset without benefiting from pretraining data and must learn the relationships between columns from their entries alone since they lack natural language understanding. LLMs and TabPFN excel on small tabular datasets where a strong prior is essential, yet they are not competitive with GBDTs on medium or large datasets, since their context lengths are limited. In this paper, we propose a simple and lightweight approach for fusing large language models and TabPFN with gradient-boosted decision trees, which allows scalable GBDTs to benefit from the natural language capabilities and pretraining of transformers. We name our fusion methods LLM-Boost and PFN-Boost, respectively. While matching or surpassing the performance of the transformer at sufficiently small dataset sizes and GBDTs at sufficiently large sizes, LLM-Boost and PFN-Boost outperform both standalone components on a wide range of dataset sizes in between. We demonstrate state-of-the-art performance against numerous baselines and ensembling algorithms. We find that PFN-Boost achieves the best average performance among all methods we test for all but very small dataset sizes. We release our code at this http URL .

Comments:	12 pages, 6 figures
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
ACM classes:	I.2.m; I.2.6; I.2.7
Cite as:	arXiv:2502.02672 [cs.CL]
	(or arXiv:2502.02672v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.02672

Submission history

From: Bethmage Mayuka Jayawardhana [view email]
[v1] Tue, 4 Feb 2025 19:30:41 UTC (766 KB)
[v2] Thu, 6 Feb 2025 02:39:35 UTC (766 KB)

Computer Science > Computation and Language

Title:Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators