Skip to main content Skip to main navigation


Efficient Sample Reuse in EM-Based Policy Search

Hirotaka Hachiya; Jan Peters; Masashi Sugiyama
In: Wray L. Buntine; Marko Grobelnik; Dunja Mladenic; John Shawe-Taylor (Hrsg.). Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Proceedings, Part I. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD-2009), September 7-11, Bled, Slovenia, Pages 469-484, Lecture Notes in Artifical Intelligence (LNAI), Vol. 5781, Springer, 2009.


Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called (R), is demonstrated through a robot learning experiment.

Weitere Links