Data and AI cluster

Project: Offline Model-based Reinforcement Learning

Description

Offline Reinforcement Learning (RL) addresses settings where online interaction is impractical, costly, or unsafe, enabling applications from healthcare to robotics. Learning from offline data is challenging due to distributional shift, which causes extrapolation errors that cannot be corrected without further exploration. Model-free RL methods regularize policies to stay close to the behavior policy but often generalize poorly due to sample complexity. Model-based approaches, which first learn an empirical MDP from offline data and then optimize policies via simulated interactions, improve sample efficiency. However, state-of-the-art methods either rely heavily on accurate uncertainty estimation or are overly conservative, limiting policy improvement.

In this project, we aim to design, analyze, and empirically evaluate improved algorithms for Offline Model-Based Reinforcement Learning. This will involve a thorough review of the literature and the development of methods to enhance the reliability of learned dynamics models, generate informative augmented trajectories, improve policy learning, and provide theoretical and empirical insights into offline model-based RL.

Details

Supervisor: Maryam Tavakol
Interested?: Get in contact