Maximum Entropy Inverse Reinforcement Learning: Algorithms and Computation
In the previous post we introduced inverse reinforcement learning. We defined the problem that is associated with this field, which is that of reconstructing a reward function given a set of demonstrations, and we saw what the ability to do this implies. In addition to this, we also saw came across some classification results as well as convergence guarantees from selected methods that were simply referred to in the post. There were some challenges with the classification results that we discussed, and although there were attempts to deal with these, there is still quite a lot that we did not talk about.
Maximum Entropy Inverse Reinforcement Learning
We shall now introduce a probabilistic approach based on what is known as the principle of maximum entropy, and this provides a well defined globally normalised distribution over decision sequences, while providing the same performance assurances as previously mentioned methods. This probabilistic approach allows moderate reasoning about uncertainty in the setting inverse reinforcement learning, and the assumptions further limits the space in which we search for solutions which we saw, last time, was quite massive.…