Oleg Klimov – Jay van Zyl @ ecosystem.Ai

Retro Contest: Results

OpenAI June 22, 2018 June 22, 2018

The first run of our Retro Contest — exploring the development of algorithms that can generalize from previous experience — is now complete. Though many approaches were tried, top results all came from tuning or extending existing algorithms such as PPO and Rainbow. There’s a long way to go: top performance was

Alex Nichol Christopher Hesse John Schulman Larissa Schiavo Oleg Klimov Vicki Pfau

Gym Retro

OpenAI May 25, 2018 May 25, 2018

We’re releasing the full version of Gym Retro, a platform for reinforcement learning research on games. This brings our publicly-released game count from around 70 Atari games and 30 Sega games to over 1,000 games across a variety of backing emulators. We’re also releasing the tool we use to

Alex Nichol Christopher Hesse John Schulman Larissa Schiavo Oleg Klimov Vicki Pfau

Retro Contest

OpenAI April 5, 2018 April 5, 2018

We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience. In typical RL research, algorithms are tested in the same environment where they were trained, which favors algorithms which are good at memorization and have many hyperparameters. Instead, our contest tests an

Alec Radford Filip Wolski John Schulman Oleg Klimov Prafulla Dhariwal

Proximal Policy Optimization

OpenAI July 20, 2017 July 20, 2017

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

PPO

Alec Radford Filip Wolski John Schulman Oleg Klimov Prafulla Dhariwal

Proximal Policy Optimization

OpenAI July 20, 2017 July 20, 2017

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

View

Share this:

Share this:

Share this:

Share this:

Share this: