Approximate Value Iteration Based on Numerical QuadratureJulia Vinogradska; Bastian Bischoff; Jan Peters
In: IEEE Robotics and Automation Letters (RA-L), Vol. 3, No. 2, Pages 1330-1337, IEEE, 2018.
Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Value iteration approaches have proven an outstanding flexibility, while maintaining high data efficiency when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that the state and action spaces must typically be discretized and often the value function update is analytically intractable. In this letter, we propose a projection based approximate value iteration approach, that employs numerical quadrature for the value function update step. It can handle continuous state and action spaces and noisy measurements of the system dynamics while learning globally optimal control from scratch. In addition, the proposed approximation technique allows for upper bounds on the approximation error, which can be used to guarantee convergence of the proposed approach to an optimal policy under some assumptions. Empirical evaluations on the mountain benchmark problem show the efficiency of the proposed approach and support our theoretical results.