Skip to main content Skip to main navigation


Policy evaluation with temporal differences: a survey and comparison

Christoph Dann; Gerhard Neumann; Jan Peters
In: Journal of Machine Learning Research, Vol. 15, No. 1, Pages 809-883, JMLR, 2014.


Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.

Weitere Links