Data and AI cluster

Project: Exploring sparsity in lifelong learning

Description

In the dynamic world, deep neural networks (DNNs) must continually adapt to new data and environments. Unlike humans, who can learn continually without forgetting past knowledge, DNNs often suffer from catastrophic forgetting when exposed to new data, causing them to lose previously acquired information. This limitation hinders their ability to learn and perform multiple tasks over time. Continual learning [1, 2, 3] is crucial for enabling models to retain old knowledge while learning new tasks, ensuring long-term stability and adaptability. However, neural networks are still over-parameterized compared to the human brain. Therefore, this thesis aims to explore the potential of incorporating sparsity in continual learning to address these challenges effectively.

We will investigate various continual learning methods and also various pruning strategies. While magnitude-based pruning removes weights with the smallest magnitudes, dynamic sparse training adjusts sparsity patterns during training, allowing for an adaptive model. The lottery ticket hypothesis [4] identifies a sparse subnetwork that can be retrained for new tasks. Elastic Weight Consolidation (EWC) [5] and Synaptic Intelligence (SI) [6] protect important weights by adding regularization terms to the loss function, penalizing significant changes. Sparse evolutionary training iteratively evolves sparse structures by adding or removing connections. Structured sparsity involves pruning entire neurons, filters, or layers, leading to efficient and interpretable models. Additionally, we also investigate sharpness-aware minimization (SAM) based optimization [7] and low-rank adaptation in the CL setting. By integrating these techniques, we aim to develop sparsity-driven continual learning methods that are both efficient and resilient to forgetting, drawing inspiration from the human brain’s ability to learn incrementally and retain essential knowledge.

Primary contact - Shruthi Gowda (s.gowda@tue.nl)

References

[1] Pietro Buzzega, Matteo Boschini, Angelo Porrello, Davide Abati, and Simone Calderara. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.

[2] Elahe Arani, Fahad Sarfraz, and Bahram Zonooz. Learning fast, learning slow: A general continual learning method based on complementary learning system. In International Conference on Learning Representations, 2021.

[3] Shruthi Gowda, Bahram Zonooz, and Elahe Arani. Dual cognitive architecture: Incorporating biases and multi-memory systems for lifelong learning. Transactions on Machine Learning Research, 2023.

[4] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2019.

[5] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.

[6] Friedeman Zenke, Ben Poole, and Surya Ganguli. Continual learning through synaptic intelligence. In International Conference on Machine Learning, pages 3987–3995. PMLR, 2017.

[7] Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2020

Details

Student: SG
Shruthi Gowda
Supervisor: Bahram Zonooz
Secondary supervisor: Elahe Arani
Interested?: Get in contact