back to list

Project: Data-Efficient Reinforcement Learning under Constraints


Reinforcement learning (RL) is a general learning, predicting, and decision-making paradigm and applies broadly in many disciplines, including science, engineering, and humanities. Conventionally, classical RL approaches have seen prominent successes in many closed world problems, such as Atari games, AlphaGo, and robotics. However, dealing with so-called open-world problems (such as recommender systems, online advertising, and marketing campaigns) is a big challenge, as they involve a multi-step decision-making process, where a stream of interactions occurs between the user and the system. Leveraging reward signals from these interactions and creating a scalable and performant recommendation inference model is a key challenge.

A close bond with practice poses unique challenges for research, such as learning under constraints, learning from the logged data, and data-efficient exploration. Admittedly, obtaining a high-fidelity simulator for such environments is extremely expensive (if possible at all), and practitioners from real-world domains have to use pre-collected non-optimal data for learning. Further,  in standard RL, the agent learns by trial and error based on the scalar signal, called the reward, it receives from the environment, aiming to maximize the average reward. Nonetheless, this is insufficient in many settings because the desired properties of the agent behavior are better described using constraints. Consequently, practitioners have to mediate between different competing objectives and constraints. Finally, classical RL algorithms are known to be data-hungry, and developing algorithms with data-efficient exploration is also a big concern for practitioners.

This project aims to extend RL algorithms' capabilities towards real life concerns: learning under constraints, learning from the logged data, and data-efficient exploration. In collaboration with KPN, we will bridge a gap between theory and practice for these problems.


You will need some affinity with probability theory, constrained optimization, and a basic understanding of RL algorithms while doing the literature review for this project.

Additional literature:

[1] H. M. Le, C. Voloshin, and Y. Yue. Batch policy learning under constraints, 2019.

[2] D. Provodin, P. Gajane, M. Pechenizkiy, and M. Kaptein. An empirical evaluation of posterior sampling for constrained reinforcement learning, 2022.

[3] Liyuan Zheng and Lillian Ratliff. Constrained upper confidence reinforcement learning. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.. 

[4] H. Cai, K. Ren, W. Zhang, K. Malialis, J. Wang, Y. Yu, and D. Guo. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 2017.

Mykola Pechenizkiy
Secondary supervisor
Danil Provodin
Get in contact