A touristic recommender system (TRS; Dalla Vecchia et al., 2024; Gaonkar et al., 2018; de Nijs et al., 2018) often provides to its users a sequence of recommendations instead of a single suggestion to optimize the user experience in the available time interval. Due to the sequential nature, as well as the uncertainty about the dynamics of the system, reinforcement learning (RL, Sutton and Barto, 2018) has been used to optimize the suggestions of attractions. In this application, the system makes suggestions based on contextual information, such as the user’s preferences, weather forecast, and the occupancy of the attractions. Carefully coordinated recommendations can improve the global tourist experience as it avoids attractions being over capacity. In this way, the problem of sustainable tourism is taken into account (Yu and Egger, 2021).
On the one hand, prior methods have considered how to deal with the uncertainty in the model by using reinforcement learning techniques to optimize the sequences of recommendations Dalla Vecchia et al. (2024). On the other hand, to deal with the capacity constraints, previous methods have considered planning in CMDPs (de Nijs et al., 2018)
Optimize a TRS based on Multi-agent Constrained Reinforcement Learning (RL; Sutton and Barto, 2018).