back to list

Project: Continual Structure from Motion


Autonomous vehicles and robots need 3D information such as depth and pose to traverse paths safely and correctly. Classical methods utilize hand-crafted features that can potentially fail in challenging scenarios, such as those with low texture [1]. Although neural networks can be trained on monocular data to predict 3D structure in a supervised manner [2], depth annotation is expensive and can be hard to obtain. In contrast, recent methods have co-trained depth and pose estimation networks in a self-supervised manner [3]. Many improvements have been made for self-supervised depth and pose estimation by adding losses and constraints from classical vision [4] or utilizing newer network architectures [5, 6].

However, to fully exploit the potential of self-supervised learning for 3D vision, networks must be able to work on a diverse set of incoming data, sourced from different regions with distinct road structures, in different weather conditions, etc [1, 7]. Standard neural network training requires access to these diverse scenes from the beginning, which is not feasible for long-term deployment of models. When new unseen environments are encountered, the networks often fail to generalize due to domain shift. Directly continuing the training on this new environment also leads to ‘catastrophic forgetting’ of the previously learned information, i.e. when the network learns to predict depth and pose in the new environment, its performance on the previously observed environment(s) drops.

Continual learning focuses on this problem, with the aim of developing models that can learn from newer scenes with different data distributions, while not forgetting previously learned information. This is representative of the real world, where robots and autonomous vehicles can travel to previously unseen environments, while also revisiting the known environments. Although many works in continual learning have focused on preventing catastrophic forgetting in image classification [8, 9], such works are lacking for self-supervised learning of depth and pose estimation. Therefore, we are looking for students who would study the problem of catastrophic forgetting in depth and pose estimation, and develop relevant solutions for the same.


[1] Ariel Gordon, Hanhan Li, Rico Jonschkowski, and Anelia Angelova. Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8977–8986, 2019.

[2] Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, and Ping Tan. Newcrfs: Neural window fully-connected crfs for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022.

[3] Hemang Chawla, Arnav Varma, Elahe Arani, and Bahram Zonooz. Multimodal scale consistency and awareness for monocular self-supervised depth estimation. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021.

[4] Vitor Guizilini, Rares, Ambrus, , Dian Chen, Sergey Zakharov, and Adrien Gaidon. Multi-frame self-supervised depth with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 160–170, 2022.

[5] Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 3d packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2485–2494, 2020.

[6] Arnav Varma., Hemang Chawla., Bahram Zonooz., and Elahe Arani. Transformers in self-supervised monocular depth estimation with unknown camera intrinsics. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP,, pages 758–769. INSTICC, SciTePress, 2022.

[7] Hemang Chawla, Matti Jukola, Terence Brouns, Elahe Arani, and Bahram Zonooz. Crowdsourced 3d mapping: A combined multi-view geometry and self-supervised learning approach. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4750–4757. IEEE, 2020.

[8] Prashant Bhat, Bahram Zonooz, and Elahe Arani. Consistency is the key to further mitigating catastrophic forgetting in continual learning. In Conference on Lifelong Learning Agents, 2022.

[9] Fahad Sarfraz, Elahe Arani, and Bahram Zonooz. Synergy between synaptic consolidation and experience replay for general continual learning. In Conference on Lifelong Learning Agents, 2022.

Bahram Zonooz
Secondary supervisor
Elahe Arani
Get in contact