Learning of Non-Parametric Control Policies with High-Dimensional State FeaturesHerke van Hoof; Jan Peters; Gerhard Neumann
In: Guy Lebanon; S. V. N. Vishwanathan (Hrsg.). Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics. International Conference on Artificial Intelligence and Statistics (AISTATS-2015), May 9-12, San Diego, California, USA, JMLR Workshop and Conference Proceedings, Vol. 38, JMLR.org, 2015.
Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.