Data and AI cluster

Project: Human Visual System Inspired Mechanisms for Interpretability

Description

Every second, around 10⁷ to 10⁸ bits of information reach the human visual system (HVS) [IK01]. Because biological hardware has limited computational capacity, complete processing of massive sensory information would be impossible. The HVS has therefore developed two mechanisms, foveation and fixation, that preserve perceptual performance while resulting in computational savings.

The human eye is structured in a unique way. Photoreceptors, which are specialized cells that respond to light, are not evenly distributed on the retina, which is located in the rear of the human eye. The density of photoreceptor cells is highest in the center of the retina and progressively decreases as the eccentricity increases [CSKH90]. As a result, when a human observer looks at a location in a real-world image, the eye transmits a variable resolution image to the high-level processing units of the human brain. This is known as foveation.

The foveation mechanism functions similarly to an image compression process, significantly compressing the out-of-center parts. However, foveation alone may impair perceptual function since details of the surroundings may be missed. As a workaround, the HVS has created a set of eye movements that direct the gaze to specific points in an image. The eye is drawn to salient parts in a scene as potential points of attention and for further processing, allowing it to generate a precise map of the scene from a sequence of images of varying resolution [IK01, BT09]. Searching the scene with eye movements is called a fixation mechanism.

Motivated by biological intuition, several works focused on leveraging foveation and fixation mechanisms for machine learning approaches [VBPP20, AE17, TLCR21, JWE21]. Studies show that artificial foveation and fixation mechanisms make neural network architectures more robust to noise and adversarial attacks, in addition to being more efficient. Furthermore, explicitly learning the fixation points enables the design of visualization techniques to interpret the decision of the model.

The human visual system naturally directs attention toward the most prominent features of a scene. This suggests that the HVS-inspired models’ ability to identify fixation points provides a useful resources for creating saliency maps. Applications of saliency maps include activity recognition, object segmentation, recognition, and detection, as well as gaze-aware compression and summarization [Bor18]. Furthermore, interpretability is a significant use case because it focuses on understanding the reasoning behind the predictions made by a deep learning mode. This thesis will examine the fixation points as a useful tool for understanding the model’s output, as these are the regions that have an impact on the model’s choice.

References:

[AE17] Emre Akbas and Miguel P Eckstein. Object detection through search with a foveated visual system. PLoS computational biology, 13(10):e1005743, 2017.

[Bor18] Ali Borji. Saliency prediction in the deep learning era: Successes, limitations, and future challenges. arXiv preprint arXiv:1810.03716, 2018.

[BT09] Neil DB Bruce and John K Tsotsos. Saliency, attention, and visual search: An information theoretic approach. Journal of vision, 9(3):5–5, 2009.

[CSKH90] Christine A Curcio, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. Human photoreceptor topography. Journal of comparative neurology, 292(4):497–523, 1990.

[IK01] Laurent Itti and Christof Koch. Computational modelling of visual attention. Nature reviews neuroscience, 2(3):194–203, 2001.

[JWE21] Aditya Jonnalagadda, William Wang, and Miguel P Eckstein. Foveater: Foveated transformer for image classification. arXiv preprint arXiv:2105.14173, 2021.

[TLCR21] Chittesh Thavamani, Mengtian Li, Nicolas Cebron, and Deva Ramanan. Fovea: Foveated image magnification for autonomous navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15539–15548, 2021.

[VBPP20] Manish Reddy Vuyyuru, Andrzej Banburski, Nishka Pant, and Tomaso Poggio. Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems, 33:2135–2146, 2020.

Details

Supervisor: Elahe Arani
Secondary supervisor: Bahram Zonooz
Interested?: Get in contact