Publikation
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
Hector Kohler; Quentin Delfosse; Riad Akrour; Kristian Kersting; Philippe Preux
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2405.14956, Pages 1-27, arXiv, 2024.
Zusammenfassung
Deep reinforcement learning agents are prone to goal misalignments. The black-
box nature of their policies hinders the detection and correction of such misalign-
ments, and the trust necessary for real-world deployment. So far, solutions learn-
ing interpretable policies are inefficient or require many human priors. We propose
INTERPRETER2, a fast distillation method producing INTerpretable Editable tRee
Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTER-
PRETER compact tree programs match oracles across a diverse set of sequential
decision tasks and evaluate the impact of our design choices on interpretability
and performances. We show that our policies can be interpreted and edited to
correct misalignments on Atari games and to explain real farming strategies
