Publikation
Massively Scaling Explicit Policy-conditioned Value Functions
Nico Bohlinger; Jan Peters
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2502.11949, Pages 1-5, arXiv, 2025.
Zusammenfassung
We introduce a scaling strategy for Explicit Policy-Conditioned Value Functions (EPVFs) that significantly improves
performance on challenging continuous-control tasks. EPVFs learn a value function V (θ) that is explicitly conditioned
on the policy parameters, enabling direct gradient-based updates to the parameters of any policy. However, EPVFs at
scale struggle with unrestricted parameter growth and efficient exploration in the policy parameter space. To address
these issues, we utilize massive parallelization with GPU-based simulators, big batch sizes, weight clipping and scaled
peturbations. Our results show that EPVFs can be scaled to solve complex tasks, such as a custom Ant environment,
and can compete with state-of-the-art Deep Reinforcement Learning (DRL) baselines like Proximal Policy Optimization
(PPO) and Soft Actor-Critic (SAC). We further explore action-based policy parameter representations from previous work
and specialized neural network architectures to efficiently handle weight-space features, which have not been used in
the context of DRL before.
