Data and AI cluster

Project: [NXP] Efficient AI models

Description

Current AI models are too large and expensive for practical applications. You are asked to explore one of the techniques below (or a combination) to make them more efficient:

Speculative Decoding:

* Vision Language Model Speculative Decoding

* Grammar based Drafters with Token Tree Verification

* Speculative Decoding for Diffusion Language Models

Sparsification:

* Knowledge-Distillation-based LLM Structured Pruning

* Task specific LLM Structured Pruning

* VLM /LLM Token Pruning

* Vocabulary Pruning

* Context pruning / prompt pruning / KV-cache pruning

Quantization:

* Extreme few-bit Quantization (ternary, 2bit etc)

* Mixed precision / mixed bit-width quantization

* CNN quantization

* Vision Language Action model quantization

* Diffusion LM quantization

Note: this project is external and requires interviews with the company

Details

Supervisor: Joaquin Vanschoren
External location: NXP
Interested?: Get in contact