Deep reinforcement learning has been successfully used for driving a car in the Gran Turismo video game, outperforming experts (Wurman et al., 2022). However, an open question remains: how to tune the car used during the races?
This problem can be modeled as a configurable MDP (Metelli et al., 2018), where we can change some of the problem's parameters before an episode (race) starts. In this project, we study how to find the parameters that allow the RL agents to reach the best performance.
Recently, curriculum learning has been used to generate a sequence of tasks that speed up the training of the RL agents (Narvekar et al., 2020). Unsupervised environment design (Li et al., 2025) uses a similar approach in which the teacher proposes configurations for the underlying agent to learn how to operate in arbitrary configurations.
The challenge of this project is to find a way to improve the sequence without a predefined target parameter.
Photo by Rakesh Sitnoor on Unsplash.
Thiago Simão