Preserving Outputs Precisely while Adaptively Rescaling Targets
Multi-task learning - allowing a single agent to learn how to solve many different tasks - is a longstanding objective for artificial intelligence research. Recently, there has been a lot of excellent progress, with agents likeDQN able to use the same algorithm to learn to play multiple games including Breakout and Pong. These algorithms were used to train individual expert agents for each task. As artificial intelligence research advances to more complex real world domains, building a single general agent - as opposed to multiple expert agents - to learn to perform multiple tasks will be crucial. However, so far, this has proven to be a significant challenge.One reason is that there are often differences in the reward scales our reinforcement learning agents use to judge success, leading them to focus on tasks where the reward is arbitrarilyhigher. For example, in the Atari game Pong, the agent receives a reward of either -1, 0, or +1 per step. In contrast, an agent playing Ms. Pac-Man can obtain hundreds or thousands of points in a single step. Even if the size of individual rewards is comparable, the frequency of rewards can change over time as the agent gets better.