back to list

Project: Self-adapting Automated Machine Learning

Description

There is an infinite number of ways to design a machine learning system, and many careful decisions need to be made based on prior experience. The field of automated machine learning (AutoML) aims to make these decisions in a data-driven, objective, and automated way.

In the real world, however, data often evolves, which can quickly make previously optimized models suboptimal. Ideally, AutoML systems can detect this and take corrective actions, such as adapting previous models. The best way to do that depends on how the data changes over time. It could be subtle or dramatic, and sudden or gradual. In some cases, we only need to finetune some hyperparameters, in other cases we may have to entirely redesign machine learning models.

Unfortunately, current concept drift detection algorithms can't accurately characterize the type of change in the data. The challenge of this work is to characterize different types of concept drift, and ideally leverage this to take optimal corrective actions for AutoML systems. There is some work in this area, but this has not yet been tried to improve AutoML systems.

Alternatively, you could look into novel ways to sidestep this issue entirely. For instance, you could consider AutoML to be a never-ending process that continuously re-optimizes, whether the underlying data changes or not. This could potentially be possible using evolution techniques that simply evolve pipelines with the data. The challenge is then to work out how AutoML systems should operate in this setting. For instance, do you consider only the most recent 'window' of data? Do you keep a memory of good models and then choose or combine them based on what currently works best? Should you continuously train these models or re-train them regularly (which might be too expensive).

This work could be done on various data streams, but also (using deep learning models) on known image problems with concept drift (e.g. WILDS).

Further reading:

  • WILDS: https://arxiv.org/abs/2012.07421
  • Webb, Geoffrey & Lee, Loong & Goethals, Bart & Petitjean, François. (2018). Analyzing concept drift and shift from sample data. Data Mining and Knowledge Discovery. 32. 10.1007/s10618-018-0554-1.
Details
Supervisor
Joaquin Vanschoren
Interested?
Get in contact