The current generation of dense transformer methods is too expensive for many practical applications. In this project you will explore novel families of (post-transformer) architectures that are much more efficient.
Possible directions include:
* State-space models
* Hybrid Architectures (eg. State-Space-model + Transformer architectures)
* (quantization-aware) Diffusion LMs
* Mixture of Experts and related optimizations (pruning, specdec etc)
Note: this is an industry project and will require an interview with the company
Joaquin Vanschoren