In this project, we study the development of reinforcement learning (RL) for applications with many constraints.
Applying RL requires designing the reward function, which can be challenging in applications with many objectives. For instance, in autonomous driving, the RL agent should minimize the time to reach the destination, keep speed under the limit, remain in a single lane, keep distance to other vehicles sufficiently large, remain on the road, avoid switching lanes too often, maintain comfort and stability, avoid abrupt acceleration, braking, or steering, and follow the traffic rules. Specifying such complex behaviors based only on the reward can be extremely difficult, as it requires finding a balance between them. Moreover, the agent may find ways of exploiting the reward, leading to undesirable behaviors.
Alternatively, the direct specification of the behavior using constraints is an alternative to reward engineering, which requires less effort from the user and allows the agent to find a tradeoff between them autonomously.
However, most work on constrained RL focuses on a single constraint.
In this project, we investigate how to handle problems with many constraints effectively.
Thiago Simão