Tim Salimans
Tim Salimans

Learning Montezuma’s Revenge from a Single Demonstration

We’ve trained an agent to achieve a high score of 74,500 on Montezuma’s Revenge from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstration, and learns from them by