Data and AI cluster

Project: Synthetic data generation for causal learning

Description

An arguably major difficulty for improving causal inferences is the lack of availability of data. While observational data are abundant, interventional data are not. This internal project aims at creating software tools to generate data that can be useful for testing causal learning approaches. By starting from a ground-truth model, one can generate observations and run interventions, which can be put together as a benchmark data to later be used to test causal inference approaches (which do not have access to the ground-truth). There are multiple challenges to generate such benchmarks which will be explored in this project, including the relations with credal networks.

The project will require a good understanding of graphical models and causality, as well as coding skills to build the tool.

References:

https://ftp.cs.ucla.edu/pub/stat_ser/r350-reprint.pdf

https://www.microsoft.com/en-us/research/video/tutorial-session-b-causes-and-counterfactuals-concepts-principles-and-tools/

https://arxiv.org/abs/2008.00463

https://arxiv.org/abs/2307.08304

Details

Supervisor: Cassio de Campos
Interested?: Get in contact