Skip to main content Skip to main navigation


On Optimal Behavior Under Uncertainty in Humans and Robots

Boris Belousov
PhD-Thesis, Technische Universität Darmstadt, 2022.


Despite significant progress in robotics and automation in the recent decades, there still remains a noticeable gap in performance compared to humans. Although the computation capabilities are growing every year, and are even projected to exceed the capacities of biological systems, the behaviors generated using current computational paradigms are arguably not catching up with the available resources. Why is that? It appears that we are still lacking some fundamental understanding of how living organisms are making decisions, and therefore we are unable to replicate intelligent behavior in artificial systems. Therefore, in this thesis, we attempted to develop a framework for modeling human and robot behavior based on statistical decision theory. Different features of this approach, such as risk-sensitivity, exploration, learning, control, were investigated in a number of publications. First, we considered the problem of learning new skills and developed a framework of entropic regularization of Markov decision processes (MDP). Utilizing a generalized concept of entropy, we were able to realize the trade-off between exploration and exploitation via a choice of a single scalar parameter determining the divergence function. Second, building on the theory of partially observable Markov decision process (POMDP), we proposed and validated a model of human ball catching behavior. Crucially, information seeking behavior was identified as a key feature enabling the modeling of observed human catches. Thus, entropy reduction was seen to play an important role in skillful human behavior. Third, having extracted the modeling principles from human behavior and having developed an information-theoretic framework for reinforcement learning, we studied the real-robot applications of the learning-based controllers in tactile-rich manipulation tasks. We investigated vision-based tactile sensors and the capability of learning algorithms to autonomously extract task-relevant features for manipulation tasks. The specific feature of tactile-based control that perception and action are tightly connected at the point of contact, enabled us to gather insights into the strengths and limitations of the statistical learning approach to real-time robotic manipulation. In conclusion, this thesis presents a series of investigations into the applicability of the statistical decision theory paradigm to modeling the behavior of humans and for synthesizing the behavior of robots. We conclude that a number of important features related to information processing can be represented and utilized in artificial systems for generating more intelligent behaviors. Nevertheless, these are only the first steps and we acknowledge that the road towards artificial general intelligence and skillful robotic applications will require more innovations and potentially transcendence of the probabilistic modeling paradigm.

Weitere Links