Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift

Autenrieth, Maximilian; van Dyk, David A.; Trotta, Roberto; Stenning, David C.

doi:10.1002/sam.11643

Statistics > Machine Learning

arXiv:2106.11211 (stat)

[Submitted on 21 Jun 2021 (v1), last revised 17 May 2023 (this version, v2)]

Title:Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift

Authors:Maximilian Autenrieth, David A. van Dyk, Roberto Trotta, David C. Stenning

View PDF

Abstract:We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative, a situation known as covariate shift. We build upon a well-established methodology in causal inference, and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. In practice, this is achieved by fitting learners within strata constructed by partitioning the data based on the estimated propensity scores, leading to approximately balanced covariates and much-improved target prediction. We demonstrate the effectiveness of our general-purpose method on two contemporary research questions in cosmology, outperforming state-of-the-art importance weighting methods. We obtain the best reported AUC (0.958) on the updated "Supernovae photometric classification challenge", and we improve upon existing conditional density estimation of galaxy redshift from Sloan Data Sky Survey (SDSS) data.

Subjects:	Machine Learning (stat.ML); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG)
Cite as:	arXiv:2106.11211 [stat.ML]
	(or arXiv:2106.11211v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2106.11211
Journal reference:	Stat. Anal. Data Min.: ASA Data Sci. J. 17 (2024), e11643
Related DOI:	https://doi.org/10.1002/sam.11643

Submission history

From: Maximilian Autenrieth [view email]
[v1] Mon, 21 Jun 2021 15:53:20 UTC (1,380 KB)
[v2] Wed, 17 May 2023 12:22:56 UTC (1,368 KB)

Statistics > Machine Learning

Title:Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators