Shortcut Learning in Deep Neural Networks. Which cues will your model choose to learn? (ICLR 2022)
Deep neural networks (DNNs) often rely on easily learned discriminatory features, or cues, that are not essential to the problem at hand, a phenomenon known as shortcut learning. For instance, DNNs may recognize ducks based on the typical background scenery of lakes or streams rather than the ducks themselves. This shortcut bias limits the generalization of DNNs, particularly in more challenging test scenarios where these shortcuts are invalid.
To investigate shortcut learning tendencies of deep learning models we introduce a new training setup, the Wisconsin Card Sorting Test for Machine Learners (WCST-ML).
Wisconsin Card Sorting Test for Machine Learners
We envision a simple, yet powerful, setting. Imagine you were given this limited set of images and labels:

You are then asked to provide a label for the following image. You can choose between 0, 1 or 2, according to your understanding of the task:

What would you choose?
The problem with this question lies in its “ill-defined” nature. The image and labels in the training matrix are fully correlated with respect to at least two explicable high-level cues, for example, shape and color. At training time, then, the classification task can be perfectly solved by relying on either feature, while at test time, an image showcasing an unseen feature combination requires generalization.
If a human were to solve this task, several factors could play a role in the final decision. For example, prior knowledge about the world, the task, or even other related tasks might influence our beliefs about the underlaying task to solve (yes, in a bayesian sense!). Ideally, we could create a set of hypotheses (hopefully containing both shape and color as possible alternatives), and finally make a choice given our prior and our observations (which we can update as we see more data). The important part is that after we make our choice, that choice can be informative of our priors (and/or biases). In absence of a significantly different prior among many participants, choosing to label the image as 0 would be displaying some bias or preference towards shape, while choosing to label the image as 1 would showcase a color-based task logic.
During training, deep learning models have their own biases and have often been observed to collapse to easy-to-learn cues, in a phenomenon that’s known as shortcut learning. Similarly to the degenerate classification case above, available datasets typically underspecify the task to be solved, and data is hardly ever comprehensive enough to fully constrain decision-making.
In our work, we perform extensive experiments to observe how different architectures react when put through a similarly underspecified visual discrimination task. Our first interesting finding is that in these settings, despite the different inductive biases induced by different architectures, we observe largely similar behavior across models. For example, we find that models reliably prefer certain cues, like color, more than others. The color cue is a particularly easy cue to fit in the simple example above, as we have some intuition on how it can computationally be solved by something akin to channel selection in the input space. Other experiments on more complex datasets, like the UTKFace dataset show how cues such as ethnicity, gender and age also rank similarly across architecture exposing potentially sensitive biases when making inference without appropriate measures, or human-in-the-loop solutions.

Interestingly we find these biases are explicable from the point of view of the loss landscape, where the set of solutions biased to the preferred cues take a far greater volume (and tend to be flatter) than those corresponding to the averted cues.

We call this fully correlated test WCST-ML and propose a simple framework to be able to explore any such biases in a similar manner. As long as a dataset has multiple labels for each input, and enough elements to generate a fully correlated dataset, WCST-ML should be easily applicable. Much more in-depth findings are shared in the ICLR publication, which help us understand just how these shortcut biases look like while training, and potentially how to avoid them. These findings allow us to shed some light on short-cut learning in deep models, and emphasize the importance of solutions, such as active human intervention, to remove model biases that may cause negative societal impacts.
Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models (NeurIPS 2023)
In a follow-up work, we propose a solution based on an ensemble diversification method. An effective approximation to the Bayesian ideal solution above is to let an ensemble of models train to entertain a range of diverse hypothesis while observing the degenerate training data. The diversity must be functional and is thus not trivial to enforce. We propose a method based on synthetic counterfactuals sampled from an appropriately trained Diffusion Probabilistic Model (DPM). We find that DPMs cross stages of training with higher likelihood to showcase samples with novel feature combinations (originative stage).

Even when lacking fidelity, we find that these samples can be directly leveraged for ensemble diversification through model disagreement, enforcing functional diversity across models.
In the table below, you can see the fraction of models which attend to a cue (out of an ensemble of 100), when trained without diversification (like in the baseline case above), and when trained with our diversification method.

Demonstrating how we can get models to attend to non-shortcut cues, even without the need for expensive additional data collection.
For more information, check out paper!