Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box

Sousa, Lisa Barros de Andrade e; Miller, Gregor; Gleut, Ronan Le; Thalmeier, Dominik; Pelin, Helena; Piraud, Marie

Computer Science > Machine Learning

arXiv:2507.19455 (cs)

[Submitted on 25 Jul 2025]

Title:Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box

Authors:Lisa Barros de Andrade e Sousa, Gregor Miller, Ronan Le Gleut, Dominik Thalmeier, Helena Pelin, Marie Piraud

View PDF HTML (experimental)

Abstract:As machine learning models are increasingly deployed in sensitive application areas, the demand for interpretable and trustworthy decision-making has increased. Random Forests (RF), despite their widespread use and strong performance on tabular data, remain difficult to interpret due to their ensemble nature. We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in RFs by grouping instances according to shared decision paths. FGC produces human-interpretable clusters aligned with the model's internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions. FGC accurately recovered latent subclass structure on a benchmark dataset and outperformed classical clustering and post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGC uncovered biologically coherent subpopulations, disentangled disease-relevant signals from confounders, and recovered known and novel gene expression patterns. FGC bridges the gap between performance and interpretability by providing structure-aware insights that go beyond feature-level attribution.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2507.19455 [cs.LG]
	(or arXiv:2507.19455v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2507.19455

Submission history

From: Lisa Barros De Andrade E Sousa [view email]
[v1] Fri, 25 Jul 2025 17:41:39 UTC (17,347 KB)

Computer Science > Machine Learning

Title:Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators