OpenAI

Generalizing from Simulation

OpenAI October 19, 2017 October 19, 2017

Our latest robotics techniques allow robot controllers, trained entirely in simulation and deployed on physical robots, to react to unplanned changes in the environment as they solve simple tasks. That is, we’ve used these techniques to build closed-loop systems rather than open-loop ones as before. The simulator need not match

Igor Mordatch Ilya Sutskever Maruan Al-Shedivat Pieter Abbeel Trapit Bansal Yura Burda

Meta-Learning for Wrestling

OpenAI October 11, 2017 October 11, 2017

We show that for the task of simulated robot wrestling, a meta-learning agent can learn to quickly defeat a stronger non-meta-learning agent.

Igor Mordatch Ilya Sutskever Maruan Al-Shedivat Pieter Abbeel Trapit Bansal Yura Burda

Meta-Learning for Wrestling

OpenAI October 11, 2017 October 11, 2017

We show that for the task of simulated robot wrestling, a meta-learning agent can learn to quickly defeat a stronger non-meta-learning agent.

Igor Mordatch Ilya Sutskever Jakub Pachocki Szymon Sidor Trapit Bansal

Competitive Self-Play

OpenAI October 11, 2017 October 11, 2017

We’ve found that self-play allows simulated AIs to discover physical skills like tackling, ducking, faking, kicking, catching, and diving for the ball, without explicitly designing an environment with these skills in mind. Self-play ensures that the environment is always the right difficulty for an AI to improve. Taken alongside our

Igor Mordatch Ilya Sutskever Jakub Pachocki Szymon Sidor Trapit Bansal

Competitive Self-Play

OpenAI October 11, 2017 October 11, 2017

We’ve found that self-play allows simulated AIs to discover physical skills like tackling, ducking, faking, kicking, catching, and diving for the ball, without explicitly designing an environment with these skills in mind. Self-play ensures that the environment is always the right difficulty for an AI to improve. Taken alongside our

Jakob Foerster

Nonlinear Computation in Deep Linear Networks

OpenAI September 29, 2017 September 29, 2017

We’ve shown that deep linear networks — as implemented using floating-point arithmetic — are not actually linear and can perform nonlinear computation. We used evolution strategies to find parameters in linear networks that exploit this trait, letting us solve non-trivial problems.

Neural networks consist of stacks of a linear layer followed by

Jakob Foerster

Nonlinear Computation in Deep Linear Networks

OpenAI September 29, 2017 September 29, 2017

We’ve shown that deep linear networks — as implemented using floating-point arithmetic — are not actually linear and can perform nonlinear computation. We used evolution strategies to find parameters in linear networks that exploit this trait, letting us solve non-trivial problems.

Neural networks consist of stacks of a linear layer followed by

Igor Mordatch Jakob Foerster Maruan Al-Shedivat Pieter Abbeel Richard Chen Shimon Whiteson

Learning to Model Other Minds

OpenAI September 14, 2017 September 14, 2017

We’re releasing an algorithm which accounts for the fact that other agents are learning too, and discovers self-interested yet collaborative strategies like tit-for-tat in the iterated prisoner’s dilemma. This algorithm, Learning with Opponent-Learning Awareness (LOLA), is a small step towards agents that model other minds.

Read Paper

LOLA, a collaboration

Igor Mordatch Jakob Foerster Maruan Al-Shedivat Pieter Abbeel Richard Chen Shimon Whiteson

Learning to Model Other Minds

OpenAI September 14, 2017 September 14, 2017

We’re releasing an algorithm which accounts for the fact that other agents are learning too, and discovers self-interested yet collaborative strategies like tit-for-tat in the iterated prisoner’s dilemma. This algorithm, Learning with Opponent-Learning Awareness (LOLA), is a small step towards agents that model other minds.

Read Paper

LOLA, a collaboration

Alec Radford Elman Mansimov John Schulman Shun Liao Yuhuai Wu

OpenAI Baselines: ACKTR & A2C

OpenAI August 18, 2017 August 18, 2017

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: