Publikation
Value-Distributional Model-Based Reinforcement Learning
C.E. Luis; A.G. Bottero; J. Vinogradska; F. Berkenkamp; Jan Peters
In: Journal of Machine Learning Research, Vol. 25, No. 298, Pages 1-42, Journal of Machine Learning Research (JMLR), 2024.
Zusammenfassung
Quantifying uncertainty about a policy’s long-term performance is important to solve
sequential decision-making tasks. We study the problem from a model-based Bayesian
reinforcement learning perspective, where the goal is to learn the posterior distribution
over value functions induced by parameter (epistemic) uncertainty of the Markov decision
process. Previous work restricts the analysis to a few moments of the distribution over
values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional
reinforcement learning, we introduce a Bellman operator whose fixed-point is the value
distribution function. Based on our theory, we propose Epistemic Quantile-Regression
(EQR), a model-based algorithm that learns a value distribution function. We combine EQR
with soft actor-critic (SAC) for policy optimization with an arbitrary differentiable objective
function of the learned value distribution. Evaluation across several continuous-control tasks
shows performance benefits with respect to both model-based and model-free algorithms.
The code is available at https://github.com/boschresearch/dist-mbrl.