back to list

Project: Generate fair (pseudo) samples for reinforcement learning


Reinforcement learning (RL) is a computational approach to automating goal-directed decision making. Reinforcement learning problems use either the framework of multi-armed bandits or Markov decision processes (or their variants).

In some cases, RL solutions are sample inefficient and costly. To address this issue, some works in the literature propose to generate pseudo-samples[1,2], while others use importance sampling to measure the importance of samples [3].

Topics related to fairness in reinforcement learning [4] have recently received extensive interest with various promising avenues in academic as well as industry-oriented research. The focus of these works is on understanding and mitigating discrimination based on sensitive characteristics, such as, gender, race, religion, physical ability, and sexual orientation.



In this project, we will start by performing a literature review of works generating samples or measuring the importance of samples  in reinforcement learning. We will also study the fairness approaches in reinforcement learning.

Then, we aim to propose a method of generating fair samples for reinforcement learning.

It is possible to tailor the degree to which the project leans towards either theory or experiments to the skills and interests of the participating student.



1.     Theoretical knowledge of reinforcement learning and/or coding skills. 

2.     No previous knowledge in fair-machine learning is required but is a plus.


[1] Branislav Kveton, Csaba Szepesv ́ari, Sharan Vaswani, Zheng Wen,  Mohammad Ghavamzadeh, Tor Lattimore. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits. In the proceedings of the 36th International Conference on Machine Learning, 2019.

[2] Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier. Perturbed-History Exploration in Stochastic Multi-Armed Bandits. In the proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019.

[3] Jie Tang and Pieter Abbeel. On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient. In Advances in Neural Information Processing Systems, 2010.

[4] Pratik Gajane, Akrati Saxena, Maryam Tavakol, George Fletcher, Mykola Pechenizkiy. Survey on Fair Reinforcement Learning: Theory and Practice. arXiv:2205.10032

Mykola Pechenizkiy
Secondary supervisor
Pratik Gajane
Get in contact