back to list

Project: [NXP] Efficient AI models

Description

Current AI models are too large and expensive for practical applications. You are asked to explore one of the techniques below (or a combination) to make them more efficient: 


Speculative Decoding:

* Vision Language Model Speculative Decoding

* Grammar based Drafters with Token Tree Verification

* Speculative Decoding for Diffusion Language Models

Sparsification:

* Knowledge-Distillation-based LLM Structured Pruning

* Task specific LLM Structured Pruning

* VLM /LLM Token Pruning

* Vocabulary Pruning

* Context pruning / prompt pruning / KV-cache pruning


Quantization:

* Extreme few-bit Quantization (ternary, 2bit etc)

* Mixed precision / mixed bit-width quantization

* CNN quantization

* Vision Language Action model quantization

* Diffusion LM quantization


Note: this project is external and requires interviews with the company

Details
Supervisor
Joaquin Vanschoren
External location
NXP
Interested?
Get in contact