Publication
Recurrent policy gradients
Daan Wierstra; Alexander Förster; Jan Peters; Jürgen Schmidhuber
In: Logic Journal of the IGPL Oxford, Vol. 18, No. 5, Pages 620-634, Oxford University Press, 2010.
Abstract
Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches.