Skip to main content Skip to main navigation


Algorithms for Learning Markov Field Policies

Abdeslam Boularias; Oliver Kroemer; Jan Peters
In: Peter L. Bartlett; Fernando C. N. Pereira; Christopher J. C. Burges; Léon Bottou; Kilian Q. Weinberger (Hrsg.). Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Neural Information Processing Systems (NeurIPS-2012), December 3-6, Lake Tahoe, Nevada, USA, Pages 2186-2194, Curran Associates, Inc. 2012.


We present a new graph-based approach for incorporating domain knowledge in reinforcement learning applications. The domain knowledge is given as a weighted graph, or a kernel matrix, that loosely indicates which states should have similar optimal actions. We first introduce a bias into the policy search process by deriving a distribution on policies such that policies that disagree with the provided graph have low probabilities. This distribution corresponds to a Markov Random Field. We then present a reinforcement and an apprenticeship learning algorithms for finding such policy distributions. We also illustrate the advantage of the proposed approach on three problems: swing-up cart-balancing with nonuniform and smooth frictions, gridworlds, and teaching a robot to grasp new objects.

Weitere Links