Data and AI cluster

Project: Task similarity and transferability in machine learning

Description

Humans are very efficient learners because we can very efficiently leverage prior experience when learning new tasks. For instance, a child first learns how to walk, and then efficiently learns how to run (obviously without starting from scratch).

Several areas of machine learning aim to mimic this ability. Transfer learning and continual learning aim to select and transfer previously learned representations/embeddings so that new tasks can be learned quickly. Meta-learning aims to do this even more efficiently by training over distributions of very similar prior tasks.

A key question here is "When is a previous task similar enough so that I can effectively transfer information from it"? If the task is similar, the transferred priors can speed up learning dramatically, but it is different, such as prior can make learning the new task much less efficient.

There are a range of techniques to establish task similarity (see the links below). Many involve 'task embeddings' that map a given dataset/task to a vector representation. Another technique from continual learning is to keep a memory of models trained on previous tasks, and try all of them on the new tasks to see which ones are worthy starting points for learning the new task.

This is a very fundamental problem in machine learning. The challenge in this assignment is to catalog the range of proposed ideas to address this key problem, identify which are most promising, and potentially implement some of them to perform an in-depth study on how what really works in a variety of real-world settings.

Hand-designed task embeddings: https://arxiv.org/abs/1810.03548
Task2Vec: https://arxiv.org/abs/1810.03548
Optimal transport: https://www.microsoft.com/en-us/research/publication/geometric-dataset-distances-via-optimal-transport/
Metadata embeddings: https://arxiv.org/abs/1910.03698
Dataset2Vec: https://www.ismll.uni-hildesheim.de/pub/pdfs/jomaa2019c-nips.pdf
Distribution-based embeddings: https://arxiv.org/abs/2006.13708
Continual learning with task-driven priors: https://arxiv.org/abs/2012.12631

Details

Supervisor: Joaquin Vanschoren
Secondary supervisor: Prabhant Singh
Interested?: Get in contact