Nowadays, most software systems are configurable, meaning that we can tailor the settings to the specific needs of each user. Furthermore, we may already have some data available indicating each user's preferences and the software's performance under each configuration. This way, we can compute a policy to define the best configuration of the system for new users based on this data.
However, most systems have many parameters, meaning the configuration space size can grow exponentially [1]. Considering so many configurations, how can we best use the data available? So how can we predict the system's performance for a new user under configurations not seen in the dataset? A naive approach may split the data based on the configuration and user preferences. However, this results in inefficient data usage as only a certain aspect of the configurations might be relevant for each user.
In this project, we investigate how to exploit the structure of the dependencies between the user preferences and the system configurations to improve the system's performance for unseen users. In particular, we cast the problem as a factored Markov decision process that can capture such dependencies explicitly [2]. This way, we can use offline reinforcement learning algorithms to compute a new policy for the system without collecting more data [3].
Chrszon, P., Baier, C., Dubslaff, C., and Klüppelholz, S. (2023). Interaction detection in configurable systems - A formal approach featuring roles. J. Syst. Softw., 196:111556.
Boutilier, C., Dearden, R., and Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1-2):49–107.
Simão, T. D. and Spaan, M. T. J. (2019). Safe Policy Improvement with Baseline Bootstrapping in Factored Environments. In AAAI, pages 4967–4974.