Showing 1–1 of 1 results for author: Schreyer, W M

Search v0.5.6 released 2020-02-24

arXiv:2502.16329 [pdf]

cs.LG

Generalization is not a universal guarantee: Estimating similarity to training data with an ensemble out-of-distribution metric

Authors: W. Max Schreyer, Christopher Anderson, Reid F. Thompson

Abstract: Failure of machine learning models to generalize to new data is a core problem limiting the reliability of AI systems, partly due to the lack of simple and robust methods for comparing new data to the original training dataset. We propose a standardized approach for assessing data similarity in a model-agnostic manner by constructing a supervised autoencoder for generalizability estimation (SAGE).… ▽ More Failure of machine learning models to generalize to new data is a core problem limiting the reliability of AI systems, partly due to the lack of simple and robust methods for comparing new data to the original training dataset. We propose a standardized approach for assessing data similarity in a model-agnostic manner by constructing a supervised autoencoder for generalizability estimation (SAGE). We compare points in a low-dimensional embedded latent space, defining empirical probability measures for k-Nearest Neighbors (kNN) distance, reconstruction of inputs and task-based performance. As proof of concept for classification tasks, we use MNIST and CIFAR-10 to demonstrate how an ensemble output probability score can separate deformed images from a mixture of typical test examples, and how this SAGE score is robust to transformations of increasing severity. As further proof of concept, we extend this approach to a regression task using non-imaging data (UCI Abalone). In all cases, we show that out-of-the-box model performance increases after SAGE score filtering, even when applied to data from the model's own training and test datasets. Our out-of-distribution scoring method can be introduced during several steps of model construction and assessment, leading to future improvements in responsible deep learning implementation. △ Less

Submitted 25 February, 2025; v1 submitted 22 February, 2025; originally announced February 2025.

Comments: 10 pages, 5 figures

Search v0.5.6 released 2020-02-24