back to list

Project: Learning Decision Trees to Reduce the Sample Complexity of Offline Reinforcement Learning


Reinforcement Learning (RL) deals with problems that can be modeled as a Markov decision process (MDP) where the transition function is unknown. When an arbitrary policy was already in execution, and the experiences with the environment were recorded in a dataset, an offline RL algorithm can use this dataset to compute a new policy without having to collect more data [1]. Most offline RL algorithms have requirements that might be impractical, such as the availability of a large amount of historical data with good coverage of the problem.

Prior work has exploited the factored representation of the problem to mitigate these impractical requirements. In this setting, the dynamics of each variable depend only on a small subset of the remaining variables. Previously, we investigated how to compute a new policy when the subset of variables is known [2] and when it must be learned [3].

In this project, we aim to exploit the factored structure further by representing the conditional probability tables using decision trees [4]. This approach should allow us to further reduce the sample complexity of the offline RL algorithms and provide more generalization capabilities.


  1. Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv preprint arXiv:2005.01643.

  2. Simão, T. D. and Spaan, M. T. J. (2019). Safe Policy Improvement with Baseline Bootstrapping in Factored Environments. In AAAI, pages 4967–4974.

  3. Simão, T. D. and Spaan, M. T. J. (2019). Structure Learning for Safe Policy Improvement. IJCAI, pages 3453–3459.

  4. Degris, T., Sigaud, O., and Wuillemin, P. (2006). Learning the Structure of Factored Markov Decision Processes in Reinforcement Learning Problems. In ICML, pages 257–264.

Thiago Simão
Get in contact