Data and AI cluster

Project: Understanding deep learning: linear vs non-linear models and their dynamics of learning

Description

While deep learning has become extremely important in industry and society, neural networks are often considered ‘black boxes’, i.e., it is often believed that it is impossible to understand how neural networks really work. However, there are a lot of aspects we can and should try to understand -answers to questions of how and when they work will ultimately lead to safer, more robust and more explainable AI.

In this project, the aim is to gain insight in when and how non-linearities become important during training. Recent studies have shown that during the training of non-linear networks, the first phase of learning is similar to what would be learned by a linear model [1,2]. Other works have shown that, when we only consider what happens to the means of the distribution of the samples of the same class, linear and non-linear networks exhibit very similar behavior in terms of when and in what order they learn [2, 3]. In this project, you would try to answer the following questions in more detail:

are non-linear networks first learning to distinguish between class means (i.e., do they behave equivalent to linear networks in a first phase) and do they only later leverage other aspects of the data distribution to arrive at a better accuracy?
is this behavior consistent for different datasets, different architectures (fully connected, convolutional, transformer), different initial conditions, …?

You could take a full empirical approach, or you could combine experiments with some theory. The final goal is to gain a better understanding of the relationship between non-linearities, properties of the dataset and the evolution of learning.

Skills needed: basic linear algebra and basic understanding of differential equations, understanding of deep learning architectures, good programming skills.

The main supervisor of this project is Hannah Pinson

[1] Refinetti, M., Ingrosso, A., & Goldt, S. (2023, July). Neural networks trained with SGD learn distributions of increasing complexity. In International Conference on Machine Learning (pp. 28843-28863). PMLR.

[2] Kalimeris, D., Kaplun, G., Nakkiran, P., Edelman, B., Yang, T., Barak, B., & Zhang, H. (2019). Sgd on neural networks learns functions of increasing complexity. Advances in neural information processing systems, 32.

[3] Pinson, H., Lenaerts, J. & Ginis, V.. (2023). Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies. In International Conference on Machine Learning (pp.27876-27906). PMLR.

Details

Supervisor: Hannah Pinson
Interested?: Get in contact