Alec Radford – Jay van Zyl @ ecosystem.Ai

Improving Language Understanding with Unsupervised Learning

OpenAI June 11, 2018 June 11, 2018

We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well;

Block-Sparse GPU Kernels

OpenAI December 6, 2017 December 6, 2017

We’re releasing highly-optimized GPU kernels for an underexplored class of neural network architectures: networks with block-sparse weights. Depending on the chosen sparsity, these kernels can run orders of magnitude faster than cuBLAS or cuSPARSE. We’ve used them to attain state-of-the-art results in text sentiment analysis and generative modeling of

Alec Radford Durk Kingma Scott Gray

Block-Sparse GPU Kernels

OpenAI December 6, 2017 December 6, 2017

We’re releasing highly-optimized GPU kernels for an underexplored class of neural network architectures: networks with block-sparse weights. Depending on the chosen sparsity, these kernels can run orders of magnitude faster than cuBLAS or cuSPARSE. We’ve used them to attain state-of-the-art results in text sentiment analysis and generative modeling of

Alec Radford Elman Mansimov John Schulman Shun Liao Yuhuai Wu

OpenAI Baselines: ACKTR & A2C

OpenAI August 18, 2017 August 18, 2017

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

Alec Radford Elman Mansimov John Schulman Shun Liao Yuhuai Wu

OpenAI Baselines: ACKTR & A2C

OpenAI August 18, 2017 August 18, 2017

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

Alec Radford Filip Wolski John Schulman Oleg Klimov Prafulla Dhariwal

Proximal Policy Optimization

OpenAI July 20, 2017 July 20, 2017

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

PPO

Alec Radford Filip Wolski John Schulman Oleg Klimov Prafulla Dhariwal

Proximal Policy Optimization

OpenAI July 20, 2017 July 20, 2017

We’re releasing a new class of reinforcement learning algorithms, Proximal Policy Optimization (PPO), which perform comparably or better than state-of-the-art approaches while being much simpler to implement and tune. PPO has become the default reinforcement learning algorithm at OpenAI because of its ease of use and good performance.

View

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: