back to list

Project: Towards Cognitive-inspired Adversarial Training Approach


Deep neural networks (DNN) are achieving superior performance in perception tasks; however, they are still riddled with fundamental shortcomings. There are still core questions about what the network is truly learning. DNNs have been shown to rely on local texture information to make decisions, while humans rely on global shape information [1]. This texture bias can lead to undesirable shortcut learning, where networks learn spurious cues in the data that may not translate to test settings [2,3]. DNNs have also been shown to be extremely sensitive to adversarial attacks [4]. Adversarial perturbations are carefully crafted imperceptible noise added to the original data which results in a perceptually similar image to the original one. Although humans can still make correct predictions, models are prone to making erroneous predictions that can have disastrous consequences in safety-critical applications.

The ability of humans to be less vulnerable to shortcut learning and to identify objects irrespective of textural or adversarial perturbations can be attributed to the high-level cognitive inductive bias in the brain [5]. Shape can be considered as an inductive bias to propel networks to learn more generic and high-level abstractions [6]. Adversarial training is an effective defense technique to improve adversarial robustness, however, robustness is at odds with accuracy [7]. This trade-off between natural accuracy and adversarial robustness is one of the prevalent challenges in adversarial training.

Therefore, there is a need for solutions to improve the trade-off between natural and adversarial accuracy. This work aims to incorporate cognitive bias into DNNs, to develop an effective adversarial training scheme. The original visual data and the implicit prior knowledge (such as shape) trained by individual networks in synchrony and sharing knowledge between them helps in exploring a more generic solution space, thus enhancing the representations to be more robust. The project will involve a literature survey of adversarial training algorithms, the implementation of different inductive biases in training schemes, and the proposal of a novel algorithm to improve the trade-off between natural and adversarial accuracies.


[1] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations, 2018.

[2] Jason Jo and Yoshua Bengio. Measuring the tendency of cnns to learn surface statistical regularities. arXiv preprint arXiv:1711.11561, 2017.

[3] Kai Yuanqing Xiao, Logan Engstrom, Andrew Ilyas, and Aleksander Madry. Noise or signal: The role of image backgrounds in object recognition. In International Conference on Learning Representations, 2020.

[4] Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2018.

[5] Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018.

[6] Shruthi Gowda, Bahram Zonooz, and Elahe Arani. Inbiased: Inductive bias distillation to improve generalization and robustness through shapeawareness. arXiv preprint arXiv:2206.05846, 2022.

[7] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2018.

Elahe Arani
Secondary supervisor
Bahram Zonooz
Get in contact