Every second, around 107 to 108 bits of information reach the human visual system (HVS) [IK01]. Because biological hardware has limited computational capacity, complete processing of massive sensory information would be impossible. The HVS has therefore developed two mechanisms, foveation and fixation, that preserve perceptual performance while resulting in computational savings.
The human eye is structured in a unique way. Photoreceptors, which are specialized cells that respond to light, are not evenly distributed on the retina, which is located in the rear of the human eye. The density of photoreceptor cells is highest in the center of the retina and progressively decreases as the eccentricity increases [CSKH90]. As a result, when a human observer looks at a location in a real-world image, the eye transmits a variable resolution image to the high-level processing units of the human brain. This is known as foveation.
The foveation mechanism functions similarly to an image compression process, significantly compressing the out-of-center parts. However, foveation alone may impair perceptual function since details of the surroundings may be missed. As a workaround, the HVS has created a set of eye movements that direct the gaze to specific points in an image. The eye is drawn to salient parts in a scene as potential points of attention and for further processing, allowing it to generate a precise map of the scene from a sequence of images of varying resolution [IK01, BT09]. Searching the scene with eye movements is called a fixation mechanism.
Motivated by biological intuition, several works focused on leveraging foveation and fixation mechanisms for machine learning approaches [VBPP20, AE17, TLCR21, JWE21]. Studies show that artificial foveation and fixation mechanisms make neural network architectures more robust to noise and adversarial attacks, in addition to being more efficient. Furthermore, explicitly learning the fixation points enables the design of visualization techniques to interpret the decision of the model.
It is often difficult to get a large enough sample of high-quality data from industrial machinery
to use for model training. DL models, which can perform exceptionally well with a clean dataset,
may underperform if exposed to inaccuracy or imbalance in the training data. In order to help the
trained model reach the necessary performance, deep learning-driven data curation techniques focus
on denoising, cleaning, balancing, and labeling existing datasets [ZG21]. Models inspired by HVS
often learn fixation points that are task-dependent. In this thesis, we aim to use self-supervised
learning strategies to learn task-independent fixation points. As a result, a more effective approach
to data curation will emerge.
References:
[AE17] Emre Akbas and Miguel P Eckstein. Object detection through search with a foveated visual system. PLoS computational biology, 13(10):e1005743, 2017.
[BT09] Neil DB Bruce and John K Tsotsos. Saliency, attention, and visual search: An information theoretic approach. Journal of vision, 9(3):5–5, 2009.
[CSKH90] Christine A Curcio, Kenneth R Sloan, Robert E Kalina, and Anita E Hendrickson. Human photoreceptor topography. Journal of comparative neurology, 292(4):497–523, 1990.
[IK01] Laurent Itti and Christof Koch. Computational modelling of visual attention. Nature reviews neuroscience, 2(3):194–203, 2001.
[JWE21] Aditya Jonnalagadda, William Wang, and Miguel P Eckstein. Foveater: Foveated transformer for image classification. arXiv preprint arXiv:2105.14173, 2021.
[TLCR21] Chittesh Thavamani, Mengtian Li, Nicolas Cebron, and Deva Ramanan. Fovea: Foveated image magnification for autonomous navigation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15539–15548, 2021.
[VBPP20] Manish Reddy Vuyyuru, Andrzej Banburski, Nishka Pant, and Tomaso Poggio. Biologically inspired mechanisms for adversarial robustness. Advances in Neural Information Processing Systems, 33:2135–2146, 2020.
[ZG21] Jianjing Zhang and Robert X Gao. Deep learning-driven data curation and model interpretation for smart manufacturing. Chinese Journal of Mechanical Engineering, 34(1):1–21, 2021.