Paper accepted at ICAPS-24
Imitation learning (IL) learns either the reward (or preference) model or the behavioral policy by observing the behaviors of an expert. Unlike other popular policy-based machine learning approaches such as reinforcement learning (RL), IL does not need to specify explicit reward signals, thus it can achieve generalization with very few observations of expert trajectory.
In the ICAPS-24 work of Qian Shao, Pradeep Varakantham, and Shih-Fen Cheng, we further extend the IL framework to consider not only rewards but also cost constraints. Our approaches are tested in open-source environments such as Safety Gym and MuJoCo, and demonstrate superior performance over the current state of the art.
Our research will be presented by Qian Shao in Session 1 (Reinforcement Learning), 10am June 4, in KC 201. She will also be at the Poster Session: 16:30-18:00 June 4.