Skip to main content Skip to main navigation


Generalizing Movements with Information-Theoretic Stochastic Optimal Control

Rudolf Lioutikov; Alexandros Paraschos; Jan Peters; Gerhard Neumann
In: Journal of Aerospace Information Systems, Vol. 11, No. 9, Pages 579-595, American Institute of Aeronautics and Astronautics, 10/2014.


Stochastic optimal control is typically used to plan a movement for a specific situation. Although most stochastic optimal control methods fail to generalize this movement plan to a new situation without replanning, a stochastic optimal control method is presented that allows reuse of the obtained policy in a new situation, as the policy is more robust to slight deviations from the initial movement plan. To improve the robustness of the policy, we employ information-theoretic policy updates that explicitly operate on trajectory distributions instead of single trajectories. To ensure a stable and smooth policy update, the ”distance” is limited between the trajectory distributions of the old and the new control policies. The introduced bound offers a closed-form solution for the resulting policy and extends results from recent developments in stochastic optimal control. In contrast to many standard stochastic optimal control algorithms, the current approach can directly infer the system dynamics from data points, and hence can also be used for model-based reinforcement learning. This paper represents an extension of the paper by Lioutikov et al. (“Sample-Based Information-Theoretic Stochastic Optimal Control,” Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Piscataway, NJ, 2014, pp. 3896–3902). In addition to revisiting the content, an extensive theoretical comparison is presented of the approach with related work, additional aspects of the implementation are discussed, and further evaluations are introduced.

Weitere Links