Motivation
In safety-critical domains such as autonomous driving, healthcare robotics, and industrial automation, it is imperative for autonomous agents to not only perform tasks efficiently but also safely. Traditional imitation learning enables agents to learn behaviors by mimicking expert demonstrations. However, these methods often overlook the importance of understanding and avoiding unsafe actions, which can lead to catastrophic failures in real-world applications. Safe Contrastive Imitation Learning (SCIL) addresses this gap by utilizing both positive trajectories of safe behavior and negative trajectories of unsafe behavior. By contrasting these trajectories, agents can learn to distinguish between safe and unsafe actions, leading to more robust and reliable performance.
Challenge
The primary challenge in SCIL lies in effectively integrating negative trajectories into the learning process. Standard imitation learning algorithms focus on replicating observed behaviors without a mechanism to penalize or avoid unsafe actions explicitly. Incorporating negative demonstrations introduces complexities in defining appropriate loss functions and learning objectives that can balance the influence of safe and unsafe behaviors. Additionally, there is a risk of the agent overfitting to the provided demonstrations, failing to generalize to unseen scenarios where safety is critical. Developing a framework that can learn from both types of trajectories while ensuring generalization and scalability remains a significant hurdle.
Existing Methods
Existing imitation learning methods like Behavioral Cloning [1] and GAIL [2] replicate expert behaviors but lack mechanisms for handling negative examples, risking unsafe behavior reproduction. Safe RL methods introduce constraints but often require extensive, potentially risky interactions with the environment, posing challenges for real-world applications [3]. Offline Safe RL avoids these interactions by learning from pre-collected data [4][5] but struggles with distributional shift, inadequate unsafe data representation, and conservative policies. Contrastive learning, effective for representation learning [6][7] and exploration [8][9], has not been extensively applied to distinguish safe from unsafe behaviors in RL. Current methods thus fail to balance leveraging positive and negative examples, underlining the need for new approaches to efficiently train safe autonomous agents.
Goal
The goal of this thesis is to develop a SCIL method that effectively leverages positive and negative trajectories to train safe autonomous agents that imitate desirable behaviors while avoiding unsafe actions. This involves collecting rollouts from expert and greedy policies, labeling them as safe or unsafe, and appropriately balancing this data. The policy will then be trained using a contrastive learning objective, aiming to achieve performance that is comparable to or improved over existing offline safe RL methods.
Main Tasks
1. Conduct a comprehensive review of existing imitation learning and offline safe RL techniques and their limitations.
2. Formulate a theoretical framework for SCIL incorporating positive and negative trajectories, with appropriate loss functions and optimization strategies.
3. Collect and curate expert and diverse-level trajectory data, including both safe and unsafe behaviors, to enhance the training and evaluation.
4. Implement the SCIL framework, conduct experiments in safety-critical simulated environments using positive and negative trajectory data, and evaluate its performance and generalization compared to traditional methods.
5. Document the methodology, experimental process, results, and conclusions in a thesis.
References
[1] Pomerleau, D. A. (1989). ALVINN: An autonomous land vehicle in a neural network. In Advances in Neural Information Processing Systems (pp. 305-313).
[2] Ho, J., & Ermon, S. (2016). Generative Adversarial Imitation Learning. In Advances in Neural Information Processing Systems (pp. 4565-4573).
[3] García, J., & Fernández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437-1480.
[4] Berkenkamp, F., Turchetta, M., Schoellig, A. P., & Krause, A. (2017). Safe model-based reinforcement learning with stability guarantees. In Advances in Neural Information Processing Systems (pp. 908-918).
[5] Jain, S., Tang, A., Pfrommer, B., Kumar, V., & Stone, P. (2021). Safe Reinforcement Learning with Dead-Ends Avoidance. In IEEE International Conference on Robotics and Automation (pp. 6081-6087).
[6] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual Representations. In International Conference on Machine Learning (pp. 1597-1607).
[7] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9729-9738).
[8] Oord, A. v. d., Li, Y., & Vinyals, O. (2018). Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748.
[9] Srinivas, A., Laskin, M., & Abbeel, P. (2020). CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning (pp. 10757-10768).