Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

Purucker, Lennart; Schneider, Lennart; Anastacio, Marie; Beel, Joeran; Bischl, Bernd; Hoos, Holger

Computer Science > Machine Learning

arXiv:2307.08364 (cs)

[Submitted on 17 Jul 2023 (v1), last revised 2 Aug 2023 (this version, v2)]

Title:Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

Authors:Lennart Purucker, Lennart Schneider, Marie Anastacio, Joeran Beel, Bernd Bischl, Holger Hoos

View PDF

Abstract:Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.

Comments:	10 pages main paper, 24 pages references and appendix, 4 figures, 16 subfigures, 13 tables, to be published in: International Conference on Automated Machine Learning 2023; affiliations corrected. arXiv admin note: text overlap with arXiv:2307.00286
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
ACM classes:	I.2.6; I.5.1
Cite as:	arXiv:2307.08364 [cs.LG]
	(or arXiv:2307.08364v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.08364

Submission history

From: Lennart Purucker [view email]
[v1] Mon, 17 Jul 2023 10:02:01 UTC (93 KB)
[v2] Wed, 2 Aug 2023 16:09:56 UTC (93 KB)

Computer Science > Machine Learning

Title:Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators