Data and AI cluster

Project: Your neural network is secretly wasting capacity: why (not) compress it?

Description

Please note this project is no longer available.

We train ever larger and larger neural networks. However, several studies indicate that large parts of those large networks are not actually contributing to their performance. It has for example been shown that some layers and blocks of LLMs can be pruned (He et al., 2024) without affecting performance, some parts of trained networks are effectively performing linear transformations (Razzhigaev et al., 2024, Pinson et al., 2024), or that the weights of some layers can simply be reset to their initial random values (Zhang et al., 2022).

In this project, you will perform a systematic study to compare the different metrics used to identify a waste of capacity in large neural networks. You will then use these insights to determine new and/or optimal compressing strategies.

Required: experience with deep learning

References:

He, S., Sun, G., Shen, Z., & Li, A. (2024). What Matters in Transformers? Not All Attention is Needed (arXiv:2406.15786). arXiv. https://doi.org/10.48550/arXiv.2406.15786

Razzhigaev, A., Mikhalchuk, M., Goncharova, E., Gerasimenko, N., Oseledets, I., Dimitrov, D., & Kuznetsov, A. (2024). Your Transformer is Secretly Linear (arXiv:2405.12250). arXiv. https://doi.org/10.48550/arXiv.2405.12250

Pinson, H., Boland, A., Ginis, V., & Pechenizkiy, M. (2024). Exploring the development of complexity over depth and time in deep neural networks. ICML Workshop on High Dimensional Learning Dynamics. https://openreview.net/pdf?id=ZBU0mS0LdC

Zhang, Chiyuan, Samy Bengio, and Yoram Singer. "Are all layers created equal?." Journal of Machine Learning Research 23.67 (2022): 1-28. https://www.jmlr.org/papers/volume23/20-069/20-069.pdf

Details

Supervisor: Hannah Pinson