DeepMind papers at ICLR 2018

Between 30 April and 03 May, hundreds of researchers and engineers will gather in Vancouver, Canada, for the Sixth International Conference on Learning RepresentationsHere you can read details of all DeepMinds accepted papers and find out where you can see the accompanying poster sessions and talks. Maximum a posteriori policy optimisationAuthors: Abbas Abdolmaleki, Jost Tobias Springenberg, Nicolas Heess, Yuval Tassa, Remi MunosWe introduce a new algorithm for reinforcement learning called Maximum a posteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning.