Publication
DIME:Diffusion-Based Maximum Entropy Reinforcement Learning
Onur Celik; Zechu Li; Denis Blessing; Ge Li; Daniel Palanicek; Jan Peters; Georgia Chalvatzaki; Gerhard Neumann
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2502.02316, Pages 1-20, arXiv, 2025.
Abstract
Maximum entropy reinforcement learning
(MaxEnt-RL) has become the standard approach
to RL due to its beneficial exploration properties.
Traditionally, policies are parameterized using
Gaussian distributions, which significantly limits
their representational capacity. Diffusion-based
policies offer a more expressive alternative,
yet integrating them into MaxEnt-RL poses
challenges—primarily due to the intractability of
computing their marginal entropy. To overcome
this, we propose Diffusion-Based Maximum
Entropy RL (DIME). DIME leverages recent
advances in approximate inference with diffusion
models to derive a lower bound on the maximum
entropy objective. Additionally, we propose a
policy iteration scheme that provably converges to
the optimal diffusion policy. Our method enables
the use of expressive diffusion-based policies
while retaining the principled exploration benefits
of MaxEnt-RL, significantly outperforming
other diffusion-based methods on challenging
high-dimensional control benchmarks. It is also
competitive with state-of-the-art non-diffusion
based RL methods while requiring fewer algorith-
mic design choices and smaller update-to-data
ratios, reducing computational complexity.
