Skip to main content Skip to main navigation


Policy Evaluation with Temporal Differences: A Survey and Comparison (Extended Abstract)

Christoph Dann; Gerhard Neumann; Jan Peters
In: Ronen I. Brafman; Carmel Domshlak; Patrik Haslum; Shlomo Zilberstein (Hrsg.). Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling. International Conference on Automated Planning and Scheduling (ICAPS-2015), June 7-11, Jerusalem, Israel, Pages 359-360, AAAI Press, 2015.


Policy evaluation is an essential step in most reinforcement learning approaches. It yields a value function, the quality assessment of states for a given policy, which can be used in a policy improvement step. Since the late 1980s, this research area has been dominated by temporal-difference (TD) methods due to their data-efficiency. However, core issues such as stability guarantees in the off-policy scenario, improved sample efficiency and probabilistic treatment of the uncertainty in the estimates have only been tackled recently, which has led to a large number of new approaches.

Weitere Links