Our Paper, LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning, was accepted at the International Conference on Learning Representations (ICLR)! We achieve fast and stable inverse reinforcement learning by using a squared reward regularizer on a mixture distribution between the expert and the policy distribution. We show that this specific choice of regularizer results in a bounded Divergence, a bounded optimal reward function, and a bounded Q-function. This starkly contrasts the previously used regularizer that mainly resulted in an unbounded reward function causing instability. Also, we show that this regularizer gives a unique reinforcement learning perspective on the original perspective.
We evaluate our approach on complex locomotion tasks such as on the Atlas robot.
Interested? Here is our paper and here you go to our GitHub Repo.