Rein Houthooft
Rein Houthooft

Evolved Policy Gradients

We’re releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test time that were outside their training regime, like learning to navigate

Better Exploration with Parameter Noise

We’ve found that adding adaptive noise to the parameters of reinforcement learning algorithms frequently boosts performance. This exploration method is simple to implement and very rarely decreases performance, so it’s worth trying on any problem.

Action Space Noise Parameter Space Noise

*Parameter noise helps algorithms more

Better Exploration with Parameter Noise

We’ve found that adding adaptive noise to the parameters of reinforcement learning algorithms frequently boosts performance. This exploration method is simple to implement and very rarely decreases performance, so it’s worth trying on any problem.

Action Space Noise Parameter Space Noise

Parameter