Publication

Recurrent policy gradients

Daan Wierstra; Alexander Förster; Jan Peters; Jürgen Schmidhuber

In: Logic Journal of the IGPL Oxford, Vol. 18, No. 5, Pages 620-634, Oxford University Press, 2010.

Abstract

Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches.

Recurrent policy gradients

Abstract

More links