One of our publications in ICLR 2022 investigates the issue of shortcut cue learning in Deep Neural Networks.
We envision a simple, yet powerful, setting. Imagine you were given a set of images like in this figure
The image labels in the matrix are fully correlated with respect to at least two explicable high-level cues, shape and color. Both of these cues change with the changing of the label in the image, so this classification task can be perfectly solved by relying on either feature. After training on the images above, imagine a new sampled image comes in, and we are to predict a label for this image. The image could for example be the following:
If a human were to solve this task, they might be relying on instinct, or some kind of prior, to create a set of hypotheses as to what the underlying task actually was, and finally make a choice as to how to label the new image. If we were to label the image as 0, we would likely be choosing to do shape discrimination, while if we were to label the image as 1, we would likely be choosing to discriminate the image based on color cues. Similarly, available datasets typically underspecify the task to be solved, as data is hardly ever comprehensive enough to fully constrain decision-making. During training, deep learning models have the tendency to collapse to easy-to-learn cues, often known as shortcut biases.
We perform extensive experiments to observe how different architectures react when put through a similarly underspecified visual discrimination task, and observe largely similar behaviour across architectures and initializations.
In one of our first findings, we find models reliably prefer certain cues more than others. Color, for example, is a particularly easy cue to fit in the simple example above, as we have some intuition on how it can computationally be solved by something akin to channel selection in the input space. Other experiments on more complex datasets, like the UTKFace dataset show how cues such as ethnicity, gender and age also rank similarly across architecture exposing potentially sensitive biases when making inferences without appropriate measures, or human-in-the-loop solutions.
Interestingly we find these biases are explicable from the point of view of the loss landscape, where the set of solutions biased to the preferred cues take a far greater volume (and tend to be flatter) than those corresponding to the averted cues.
We call this fully correlated test WCST-ML and propose a simple framework to be able to explore any such biases in a similar manner. As long as a dataset has multiple labels for each input, and enough elements to generate a fully correlated dataset, WCST-ML should be easily applicable. Much more in-depth findings are shared in the ICLR publication, which help us understand just how these shortcut biases look like while training, and potentially how to avoid them. These findings allow us to shed some light on short-cut learning in deep models, and emphasize the importance of solutions, such as active human intervention, to remove model biases that may cause negative societal impacts