Data and AI cluster

Project: Model Transfer for Offline Reinforcement Learning

Description

Offline Reinforcement Learning (RL) deals with the problems where simulation or online interaction is impractical, costly, and/or dangerous, allowing to automate a wide range of applications from healthcare and education to finance and robotics. However, learning new policies from offline data suffers from distributional shifts resulting in extrapolation error, which is infeasible to improve due to lack of additional exploration. Model-free RL algorithms that regularize the policies to stay close to the behavior policy, have limited generalization ability due to the sample complexity issue. Hence, model-based RL approaches that first learn the empirical MDP using the offline dataset and then freely explore in the learned environment for optimal policies, can attain excellent sample efficiency. Nevertheless, state-of-the-art model-based methods either depend on accurate uncertainty estimation techniques or are overly conservative to the support of data.

In this project, we aim at developing a generic dynamics model that is able to generalize knowledge across various reinforcement learning tasks, so that each task learns its own policy utilizing the generic model. One potential idea is to inspire from "transformers in RL" to come up with such a universal generative model that can be employed in model-based offline RL techniques to improve sample efficiency.

The initial step involves extensive literature review in offline RL (both model-free and model-based) as well as transfer learning in RL. The following references should provide you with a good starting point for your literature review:

Details

Supervisor: Maryam Tavakol