John Schulman – Jay van Zyl @ ecosystem.Ai

Retro Contest: Results

OpenAI June 22, 2018 June 22, 2018

The first run of our Retro Contest — exploring the development of algorithms that can generalize from previous experience — is now complete. Though many approaches were tried, top results all came from tuning or extending existing algorithms such as PPO and Rainbow. There’s a long way to go: top performance was

Igor Mordatch John Schulman Larissa Schiavo

OpenAI Fellows—Fall 2018

OpenAI May 30, 2018 May 30, 2018

We’re now accepting applications for the next cohort of OpenAI Fellows, a program which offers a compensated 6-month apprenticeship in AI research at OpenAI. We designed this program for people who want to be an AI researcher, but do not have a formal background in the field. Applications for

Alex Nichol Christopher Hesse John Schulman Larissa Schiavo Oleg Klimov Vicki Pfau

Gym Retro

OpenAI May 25, 2018 May 25, 2018

We’re releasing the full version of Gym Retro, a platform for reinforcement learning research on games. This brings our publicly-released game count from around 70 Atari games and 30 Sega games to over 1,000 games across a variety of backing emulators. We’re also releasing the tool we use to

Alex Nichol Christopher Hesse John Schulman Larissa Schiavo Oleg Klimov Vicki Pfau

Retro Contest

OpenAI April 5, 2018 April 5, 2018

We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience. In typical RL research, algorithms are tested in the same environment where they were trained, which favors algorithms which are good at memorization and have many hyperparameters. Instead, our contest tests an

Alex Nichol John Schulman

Reptile: A Scalable Meta-Learning Algorithm

OpenAI March 7, 2018 March 7, 2018

We’ve developed a simple meta-learning algorithm called Reptile which works by repeatedly sampling a task, performing stochastic gradient descent on it, and updating the initial parameters towards the final parameters learned on that task. This method performs as well as MAML, a broadly applicable meta-learning algorithm, while being simpler to

Durk Kingma Ilya Sutskever John Schulman Tim Salimans

Requests for Research 2.0

OpenAI January 31, 2018 January 31, 2018

We’re releasing a new batch of seven unsolved problems which have come up in the course of our research at OpenAI. Like our original Requests for Research (which resulted in several papers), we expect these problems to be a fun and meaningful way for new people to enter the field,

John Schulman Jonathan Ho Kevin Frans Peter Chen Pieter Abbeel

Learning a Hierarchy

OpenAI October 26, 2017 October 26, 2017

We’ve developed a hierarchical reinforcement learning algorithm that learns high-level actions useful for solving a range of tasks, allowing fast solving of tasks requiring thousands of timesteps. Our algorithm, when applied to a set of navigation problems, discovers a set of high-level actions for walking and crawling in different directions,

John Schulman Jonathan Ho Kevin Frans Peter Chen Pieter Abbeel

Learning a Hierarchy

OpenAI October 26, 2017 October 26, 2017

We’ve developed a hierarchical reinforcement learning algorithm that learns high-level actions useful for solving a range of tasks, allowing fast solving of tasks requiring thousands of timesteps. Our algorithm, when applied to a set of navigation problems, discovers a set of high-level actions for walking and crawling in different directions,

Alec Radford Elman Mansimov John Schulman Shun Liao Yuhuai Wu

OpenAI Baselines: ACKTR & A2C

OpenAI August 18, 2017 August 18, 2017

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

Alec Radford Elman Mansimov John Schulman Shun Liao Yuhuai Wu

OpenAI Baselines: ACKTR & A2C

OpenAI August 18, 2017 August 18, 2017

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: