Oleg Klimov
Oleg Klimov

Retro Contest: Results

The first run of our Retro Contest — exploring the development of algorithms that can generalize from previous experience — is now complete. Though many approaches were tried, top results all came from tuning or extending existing algorithms such as PPO and Rainbow. There’s a long way to go: top performance was

Retro Contest

We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience. In typical RL research, algorithms are tested in the same environment where they were trained, which favors algorithms which are good at memorization and have many hyperparameters. Instead, our contest tests an

Proximal Policy Optimization

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

PPO

Proximal Policy Optimization

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.