Data and AI cluster

Project: Sustainable Large Language Models

Description

Project description:

Large Language Models (LLMs) are deep-learning models that achieve state-of-the-art performance in many NLP tasks. They typically consist of billions of weights. As a result, expressing weights in float32 leads to models of size at least 1GB. Such large models cannot be easily deployed on edge devices (e.g., smartphones) since they take too much space and consume too much energy. Thus, the problem is to develop and/or apply techniques (e.g., during training or post-training, quantization, new architectures) that lead to sustainable LLMs, namely, power-efficient and/or light-weight LLMs.

In this thesis: (a) you will study the techniques for formulating more sustainable LLMs, (b) you will formulate and code your own sustainable LLMs, (c) you will design and carry out evaluations for your sustainable LLMs.

Literature (examples):

Chee et al., "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", https://arxiv.org/abs/2307.13304
Liu et al., “LLM-QAT: Data-Free Quantization Aware Training for Large Language Models”, https://arxiv.org/abs/2305.17888
Bai et al., “Towards Efficient Post-training Quantization of Pre-trained Language Models”, https://arxiv.org/abs/2109.15082
Kuzmin et al., “Pruning vs Quantization: Which is Better?”, https://arxiv.org/abs/2307.02973
Park et al., “Quadapter: Adapter for GPT-2 Quantization”, https://arxiv.org/abs/2211.16912
Nagel et al., “Up or Down? Adaptive Rounding for Post-Training Quantization”, https://arxiv.org/abs/2004.10568
Tomczak, “Deep Generative Modeling”, https://link.springer.com/book/10.1007/978-3-030-93158-2

Prerequisites:

reading and understanding scientific literature
very good coding skills in Python using PyTorch and other ML libraries
good knowledge of Deep Learning and the basics of Generative AI
curious attitude, independence, thinking out-of-the box

Details

Student: DH
Dalton Harmsen
Supervisor: Jakub Tomczak