-
On the Foundations of Shortcut Learning
Authors:
Katherine L. Hermann,
Hossein Mobahi,
Thomas Fel,
Michael C. Mozer
Abstract:
Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on \emph{predictivity} -- how reliably a feature indicates training-set labels -- but also on \emph{availability} -- how easily the feature can be extracted from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for…
▽ More
Deep-learning models can extract a rich assortment of features from data. Which features a model uses depends not only on \emph{predictivity} -- how reliably a feature indicates training-set labels -- but also on \emph{availability} -- how easily the feature can be extracted from inputs. The literature on shortcut learning has noted examples in which models privilege one feature over another, for example texture over shape and image backgrounds over foreground objects. Here, we test hypotheses about which input properties are more available to a model, and systematically study how predictivity and availability interact to shape models' feature use. We construct a minimal, explicit generative framework for synthesizing classification datasets with two latent features that vary in predictivity and in factors we hypothesize to relate to availability, and we quantify a model's shortcut bias -- its over-reliance on the shortcut (more available, less predictive) feature at the expense of the core (less available, more predictive) feature. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias. Our empirical findings are consistent with a theoretical account based on Neural Tangent Kernels. Finally, we study how models used in practice trade off predictivity and availability in naturalistic datasets, discovering availability manipulations which increase models' degree of shortcut bias. Taken together, these findings suggest that the propensity to learn shortcut features is a fundamental characteristic of deep nonlinear architectures warranting systematic study given its role in shaping how models solve tasks.
△ Less
Submitted 11 July, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Getting aligned on representational alignment
Authors:
Ilia Sucholutsky,
Lukas Muttenthaler,
Adrian Weller,
Andi Peng,
Andreea Bobu,
Been Kim,
Bradley C. Love,
Christopher J. Cueva,
Erin Grant,
Iris Groen,
Jascha Achterberg,
Joshua B. Tenenbaum,
Katherine M. Collins,
Katherine L. Hermann,
Kerem Oktar,
Klaus Greff,
Martin N. Hebart,
Nathan Cloos,
Nikolaus Kriegeskorte,
Nori Jacoby,
Qiuyi Zhang,
Raja Marjieh,
Robert Geirhos,
Sherol Chen,
Simon Kornblith
, et al. (8 additional authors not shown)
Abstract:
Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by these diverse systems? Do similarities in representations then translate into similar behavior? If so, then how can a system's representations be modified to be…
▽ More
Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by these diverse systems? Do similarities in representations then translate into similar behavior? If so, then how can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most promising research areas in contemporary cognitive science, neuroscience, and machine learning. In this Perspective, we survey the exciting recent developments in representational alignment research in the fields of cognitive science, neuroscience, and machine learning. Despite their overlapping interests, there is limited knowledge transfer between these fields, so work in one field ends up duplicated in another, and useful innovations are not shared effectively. To improve communication, we propose a unifying framework that can serve as a common language for research on representational alignment, and map several streams of existing work across fields within our framework. We also lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that this paper will catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems.
△ Less
Submitted 26 November, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
What shapes feature representations? Exploring datasets, architectures, and training
Authors:
Katherine L. Hermann,
Andrew K. Lampinen
Abstract:
In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versat…
▽ More
In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed. Interestingly, in some cases, an easier, weakly predictive feature can suppress a more strongly predictive, but more difficult one. Additionally, models trained to recognize both easy and hard features learn representations most similar to models that use only the easy feature. Further, easy features lead to more consistent representations across model runs than do hard features. Finally, models have greater representational similarity to an untrained model than to models trained on a different task. Our results highlight the complex processes that determine which features a model represents.
△ Less
Submitted 22 October, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
Authors:
Katherine L. Hermann,
Ting Chen,
Simon Kornblith
Abstract:
Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on Im…
▽ More
Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not primarily from differences in their internal workings, but from differences in the data that they see.
△ Less
Submitted 3 November, 2020; v1 submitted 20 November, 2019;
originally announced November 2019.