Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
We consider goods that can be shared with k-hop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.
This paper proposes three different distributed event-triggered control algorithms to achieve leader-follower consensus for a network of Euler-Lagrange agents. We firstly propose two model-independent algorithms for a subclass of Euler-Lagrange agents without the vector of gravitational potential forces. By model-independent, we mean that each agent can execute its algorithm with no knowledge of the agent self-dynamics. A variable-gain algorithm is employed when the sensing graph is undirected; algorithm parameters are selected in a fully distributed manner with much greater flexibility compared to all previous work concerning event-triggered consensus problems. When the sensing graph is directed, a constant-gain algorithm is employed. The control gains must be centrally designed to exceed several lower bounding inequalities which require limited knowledge of bounds on the matrices describing the agent dynamics, bounds on network topology information and bounds on the initial conditions. When the Euler-Lagrange agents have dynamics which include the vector of gravitational potential forces, an adaptive algorithm is proposed which requires more information about the agent dynamics but can estimate uncertain agent parameters. For each algorithm, a trigger function is proposed to govern the event update times. At each event, the controller is updated, which ensures that the control input is piecewise constant and saves energy resources. We analyse each controllers and trigger function and exclude Zeno behaviour. Extensive simulations show 1) the advantages of our proposed trigger function as compared to those in existing literature, and 2) the effectiveness of our proposed controllers.
Motivated by the common academic problem of allocating papers to referees for conference reviewing we propose a novel mechanism for solving the assignment problem when we have a two sided matching problem with preferences from one side (the agents/reviewers) over the other side (the objects/papers) and both sides have capacity constraints. The assignment problem is a fundamental problem in both computer science and economics with application in many areas including task and resource allocation. We draw inspiration from multi-criteria decision making and voting and use order weighted averages (OWAs) to propose a novel and flexible class of algorithms for the assignment problem. We show an algorithm for finding a $\Sigma$-OWA assignment in polynomial time, in contrast to the NP-hardness of finding an egalitarian assignment. Inspired by this setting we observe an interesting connection between our model and the classic proportional multi-winner election problem in social choice.