Data and AI cluster

Project: Understanding deep learning: the initial learning rate

Description

This project is finished/closed.

While deep learning has become extremely important in industry and society, neural networks are often considered ‘black boxes’, i.e., it is often believed that it is impossible to understand how neural networks really work. However, there are a lot of aspects we can and should try to understand -answers to questions of how and when they work will ultimately lead to safer, more robust and more explainable AI.

In this project, the aim is to gain insight in the initial learning rate: how large can we make the learning rate in the initial phase of learning in deep neural networks? Some recent works relate this learning rate to the neural network architecture [1] , while other works relate this learning rate to properties of the dataset [2]. Given these previous insights, your goal will be to figure out how the architecture and the dataset together determine this maximal learning rate. Does this differ when we increase the width or depth, when we change the architecture or when we switch to a specific dataset? Ideally, we would together arrive at some insights as to why this happens, and these insights could lead to guidelines for setting the learning rate in practical settings.

Skills needed: basic linear algebra and basic understanding of differential equations, understanding of deep learning architectures, good programming skills.

[1] Iyer, G., Hanin, B., & Rolnick, D. (2023, July). Maximal Initial Learning Rates in Deep ReLU Networks. In International Conference on Machine Learning (pp. 14500-14530). PMLR.

[2] Saxe, A. M., McClelland, J. L., & Ganguli, S. (2019). A mathematical theory of semantic development in deep neural networks. Proceedings of the National Academy of Sciences, 116(23), 11537-11546.

Details

Supervisor: Hannah Pinson