A Non-Parametric Approach to Dynamic ProgrammingOliver Kroemer; Jan Peters
In: John Shawe-Taylor; Richard S. Zemel; Peter L. Bartlett; Fernando C. N. Pereira; Kilian Q. Weinberger (Hrsg.). Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Neural Information Processing Systems (NeurIPS-2011), December 12-14, Granada, Spain, Pages 1719-1727, Curran Associates, Inc. 2011.
In this paper, we consider the problem of policy evaluation for continuous-state systems. We present a non-parametric approach to policy evaluation, which uses kernel density estimation to represent the system. The true form of the value function for this model can be determined, and can be computed using Galerkin's method. Furthermore, we also present a unified view of several well-known policy evaluation methods. In particular, we show that the same Galerkin method can be used to derive Least-Squares Temporal Difference learning, Kernelized Temporal Difference learning, and a discrete-state Dynamic Programming solution, as well as our proposed method. In a numerical evaluation of these algorithms, the proposed approach performed better than the other methods.