Many real-world tasks involve multiple agents with partial observability and limited communication. Learning is challenging in these settings due to local viewpoints of agents, which perceive the world as non-stationary due to concurrently-exploring teammates. Approaches that learn specialized policies for individual tasks face major problems when applied to the real world: not only do agents have to learn and store a distinct policy for each task, but in practice the identity of the task is often non-observable, making these algorithms inapplicable. This paper formalizes and addresses the problem of multi-task multi-agent reinforcement learning under partial observability. We introduce a decentralized single-task learning approach that is robust to concurrent interactions of teammates, and present an approach for distilling single-task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity.
Submitted 17 Mar 2017 to Learning
Published 21 Mar 2017
Updated 25 Mar 2017