back to list

Project: Causal perspective of fairness in reinforcement learning


Reinforcement learning (RL) is a computational approach to automating goal-directed decision making using the feedback observed by the learning agent. In this project, we will be using the framework of multi-armed bandits and Markov decision processes.

Observational data collected from real-world systems can mostly provide associations and correlations rather than causal structure understandings. A key limitation of algorithms that are based solely on observable data is that they do not consider the mechanisms by which the data is generated and therefore may have wrong interpretations. In contrast, causal learning relies on additional knowledge structured as a model of causes and effects [1].

To address the above concerns, several works consider causal learning (see section 5.6 of [2] and references therein) in fairness-aware machine learning. In contrast, only a few works consider fairness in reinforcement learning from a causal perspective [3].



In this project, we will investigate how RL algorithms merely using correlations from observational feedback can provide potentially unfair outcomes. We aim to propose an alternative of casual fair reinforcement learning to overcome this weakness of the current state-of-the-art.



1.     A basic understanding of RL algorithms.

2.     A basic understanding of causality in machine learning.



[1] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press.

[2] A Review on Fairness in Machine Learning.  Dana Pessach, Erez Shmueli. ACM Comput. Surv. 55, 3, Article 51 (February 2022).

[3] Wen Huang, Lu Zhang, Xintao Wu. Achieving Counterfactual Fairness for Causal Bandit. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.

Mykola Pechenizkiy
Secondary supervisor
Pratik Gajane
Get in contact