# Multiagent Systems (cs.MA)

• Successful analysis of player skills in video games has important impacts on the process of enhancing player experience without undermining their continuous skill development. Moreover, player skill analysis becomes more intriguing in team-based video games because such form of study can help discover useful factors in effective team formation. In this paper, we consider the problem of skill decomposition in MOBA (MultiPlayer Online Battle Arena) games, with the goal to understand what player skill factors are essential for the outcome of a game match. To understand the construct of MOBA player skills, we utilize various skill-based predictive models to decompose player skills into interpretative parts, the impact of which are assessed in statistical terms. We apply this analysis approach on two widely known MOBAs, namely League of Legends (LoL) and Defense of the Ancients 2 (DOTA2). The finding is that base skills of in-game avatars, base skills of players, and players' champion-specific skills are three prominent skill components influencing LoL's match outcomes, while those of DOTA2 are mainly impacted by in-game avatars' base skills but not much by the other two.
• This paper addresses tracking of a moving target in a multi-agent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each time induces a global time-varying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the non-asymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multi-agent tracking with nonlinear observations.
• In multi-robot systems where a central decision maker is specifying the movement of each individual robot, a communication failure can severely impair the performance of the system. This paper develops a motion strategy that allows robots to safely handle critical communication failures for such multi-robot architectures. For each robot, the proposed algorithm computes a time horizon over which collisions with other robots are guaranteed not to occur. These safe time horizons are included in the commands being transmitted to the individual robots. In the event of a communication failure, the robots execute the last received velocity commands for the corresponding safe time horizons leading to a provably safe open-loop motion strategy. The resulting algorithm is computationally effective and is agnostic to the task that the robots are performing. The efficacy of the strategy is verified in simulation as well as on a team of differential-drive mobile robots.
• We consider a swarm of $n$ autonomous mobile robots, distributed on a 2-dimensional grid. A basic task for such a swarm is the gathering process: all robots have to gather at one (not predefined) place. The work in this paper is motivated by the following insight: On one side, for swarms of robots distributed in the 2-dimensional Euclidean space, several gathering algorithms are known for extremely simple robots that are oblivious, have bounded viewing radius, no compass, and no "flags" to communicate a status to others. On the other side, in case of the 2-dimensional grid, the only known gathering algorithms for robots with bounded viewing radius without compass, need to memorize a constant number of rounds and need flags. In this paper we contribute the, to the best of our knowledge, first gathering algorithm on the grid, which works for anonymous, oblivious robots with bounded viewing range, without any further means of communication and without any memory. We prove its correctness and an $O(n^2)$ time bound. This time bound matches those of the best known algorithms for the Euclidean plane mentioned above.
• Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
• Interactions between vehicles and pedestrians have always been a major problem in traffic safety. Experienced human drivers are generally able to analyze the environment and choose driving strategies that will help them avoid crashes. What is not yet clear, however, is how automated vehicles will interact with pedestrians. This paper proposes a new method for evaluating the safety and feasibility of the driving strategy of automated vehicles when encountering unsignalized crossings. MobilEye sensors installed on buses in Ann Arbor, Michigan, collected data on 2,973 valid crossing events. A stochastic interaction model was then created using a multivariate Gaussian mixture model. This model allowed us to simulate the movements of pedestrians reacting to an oncoming vehicle when approaching unsignalized crossings, and to evaluate the passing strategies of automated vehicles. A simulation was then conducted to demonstrate the evaluation procedure.
• Social conventions govern countless behaviors all of us engage in every day, from how we greet each other to the languages we speak. But how can shared conventions emerge spontaneously in the absence of a central coordinating authority? The Naming Game model shows that networks of locally interacting individuals can spontaneously self-organize to produce global coordination. Here, we provide a gentle introduction to the main features of the model, from the dynamics observed in homogeneously mixing populations to the role played by more complex social networks, and to how slight modifications of the basic interaction rules give origin to a richer phenomenology in which more conventions can co-exist indefinitely.
• We investigate the effects of social interactions in task al- location using Evolutionary Game Theory (EGT). We propose a simple task-allocation game and study how different learning mechanisms can give rise to specialised and non- specialised colonies under different ecological conditions. By combining agent-based simulations and adaptive dynamics we show that social learning can result in colonies of generalists or specialists, depending on ecological parameters. Agent-based simulations further show that learning dynamics play a crucial role in task allocation. In particular, introspective individual learning readily favours the emergence of specialists, while a process resembling task recruitment favours the emergence of generalists.
• Multi-agent path finding (MAPF) is well-studied in artificial intelligence, robotics, theoretical computer science and operations research. We discuss issues that arise when generalizing MAPF methods to real-world scenarios and four research directions that address them. We emphasize the importance of addressing these issues as opposed to developing faster methods for the standard formulation of the MAPF problem.
• Until now mean-field-type game theory was not focused on cognitively-plausible models of choices in humans, animals, machines, robots, software-defined and mobile devices strategic interactions. This work presents some effects of users' psychology in mean-field-type games. In addition to the traditional "material" payoff modelling, psychological patterns are introduced in order to better capture and understand behaviors that are observed in engineering practice or in experimental settings. The psychological payoff value depends upon choices, mean-field states, mean-field actions, empathy and beliefs. It is shown that the affective empathy enforces mean-field equilibrium payoff equity and improves fairness between the players. It establishes equilibrium systems for such interactive decision-making problems. Basic empathy concepts are illustrated in several important problems in engineering including resource sharing, packet collision minimization, energy markets, and forwarding in Device-to-Device communications. The work conducts also an experiment with 47 people who have to decide whether to cooperate or not. The basic Interpersonal Reactivity Index of empathy metrics were used to measure the empathy distribution of each participant. Android app called Empathizer is developed to analyze systematically the data obtained from the participants. The experimental results reveal that the dominated strategies of the classical game theory are not dominated any more when users' psychology is involved, and a significant level of cooperation is observed among the users who are positively partially empathetic.
• This paper investigates the task assignment problem for multiple dispersed robots constrained by limited communication range. The robots are initially randomly distributed and need to visit several target locations while minimizing the total travel time. A centralized rendezvous-based algorithm is proposed, under which all the robots first move towards a rendezvous position until communication paths are established between every pair of robots either directly or through intermediate peers, and then one robot is chosen as the leader to make a centralized task assignment for the other robots. Furthermore, we propose a decentralized algorithm based on a single-traveling-salesman tour, which does not require all the robots to be connected through communication. We investigate the variation of the quality of the assignment solutions as the level of information sharing increases and as the communication range grows, respectively. The proposed algorithms are compared with a centralized algorithm with shared global information and a decentralized greedy algorithm respectively. Monte Carlo simulation results show the satisfying performance of the proposed algorithms.
• Feb 15 2017 cs.GT cs.MA arXiv:1702.04138v1
All-pay auctions, a common mechanism for various human and agent interactions, suffers, like many other mechanisms, from the possibility of players' failure to participate in the auction. We model such failures, and fully characterize equilibrium for this class of games, we present a symmetric equilibrium and show that under some conditions the equilibrium is unique. We reveal various properties of the equilibrium, such as the lack of influence of the most-likely-to-participate player on the behavior of the other players. We perform this analysis with two scenarios: the sum-profit model, where the auctioneer obtains the sum of all submitted bids, and the max-profit model of crowdsourcing contests, where the auctioneer can only use the best submissions and thus obtains only the winning bid. Furthermore, we examine various methods of influencing the probability of participation such as the effects of misreporting one's own probability of participating, and how influencing another player's participation chances changes the player's strategy.
• This paper studies scenarios of cyclic dominance in a coevolutionary spatial model in which game strategies and links between agents adaptively evolve over time. The Optional Prisoner's Dilemma (OPD) game is employed. The OPD is an extended version of the traditional Prisoner's Dilemma where players have a third option to abstain from playing the game. We adopt an agent-based simulation approach and use Monte Carlo methods to perform the OPD with coevolutionary rules. The necessary conditions to break the scenarios of cyclic dominance are also investigated. This work highlights that cyclic dominance is essential in the sustenance of biodiversity. Moreover, we also discuss the importance of a spatial coevolutionary model in maintaining cyclic dominance in adverse conditions.
• Online learning with streaming data in a distributed and collaborative manner can be useful in a wide range of applications. This topic has been receiving considerable attention in recent years with emphasis on both single-task and multitask scenarios. In single-task adaptation, agents cooperate to track an objective of common interest, while in multitask adaptation agents track multiple objectives simultaneously. Regularization is one useful technique to promote and exploit similarity among tasks in the latter scenario. This work examines an alternative way to model relations among tasks by assuming that they all share a common latent feature representation. As a result, a new multitask learning formulation is presented and algorithms are developed for its solution in a distributed online manner. We present a unified framework to analyze the mean-square-error performance of the adaptive strategies, and conduct simulations to illustrate the theoretical findings and potential applications.
• Managing micro-tasks on crowdsourcing marketplaces involves balancing conflicting objectives -- the quality of work, total cost incurred and time to completion. Previous agents have focused on cost-quality, or cost-time tradeoffs, limiting their real-world applicability. As a step towards this goal we present Octopus, the first AI agent that jointly manages all three objectives in tandem. Octopus is based on a computationally tractable, multi-agent formulation consisting of three components; one that sets the price per ballot to adjust the rate of completion of tasks, another that optimizes each task for quality and a third that performs task selection. We demonstrate that Octopus outperforms existing state-of-the-art approaches in simulation and experiments with real data, demonstrating its superior performance. We also deploy Octopus on Amazon Mechanical Turk to establish its ability to manage tasks in a real-world, dynamic setting.
• This paper extends and adapts an existing abstract model into an empirical metropolitan region in Brazil. The model - named SEAL: a Spatial Economic Agent-based Lab - comprehends a framework to enable public policy ex-ante analysis. The aim of the model is to use official data and municipalities spatial boundaries to allow for policy experimentation. The current version considers three markets: housing, labor and goods. Families' members age, consume, join the labor market and trade houses. A single consumption tax is collected by municipalities that invest back into quality of life improvements. We test whether a single metropolitan government - which is an aggregation of municipalities - would be in the best interest of its citizens. Preliminary results for 20 runs indicate that it may be the case. Future developments include improving performance to enable running of higher percentage of the population and a number of runs that make the model more robust.
• In many problems, agents cooperate locally so that a leader or fusion center can infer the state of every agent from probing the state of only a small number of agents. Versions of this problem arise when a fusion center reconstructs an extended physical field by accessing the state of just a few of the sensors measuring the field, or a leader monitors the formation of a team of robots. Given a link cost, the paper presents a polynomial time algorithm to design a minimum cost coordinated network dynamics followed by the agents, under an observability constraint. The problem is placed in the context of structural observability and solved even when up to k agents in the coordinated network dynamics fail.
• In human societies, people's willingness to compete and strive for better social status as well as being envious of those perceived in some way superior lead to social structures that are intrinsically hierarchical. Here we propose an agent-based, network model to mimic the ranking behaviour of individuals and its possible repercussions in human society. The main ingredient of the model is the assumption that the relevant feature of social interactions is each individual's keenness to maximise his or her status relative to others. The social networks produced by the model are homophilous and assortative, as frequently observed in human communities and most of the network properties seem quite independent of its size. However, it is seen that for small number of agents the resulting network consists of disjoint weakly connected communities while being highly assortative and homophilic. On the other hand larger networks turn out to be more cohesive with larger communities but less homophilic. We find that the reason for these changes is that larger network size allows agents to use new strategies for maximizing their social status allowing for more diverse links between them.
• We introduce the concept of a V-formation game between a controller and an attacker, where controller's goal is to maneuver the plant (a simple model of flocking dynamics) into a V-formation, and the goal of the attacker is to prevent the controller from doing so. Controllers in V-formation games utilize a new formulation of model-predictive control we call Adaptive-Horizon MPC (AMPC), giving them extraordinary power: we prove that under certain controllability assumptions, an AMPC controller is able to attain V-formation with probability 1. We define several classes of attackers, including those that in one move can remove R birds from the flock, or introduce random displacement into flock dynamics. We consider both naive attackers, whose strategies are purely probabilistic, and AMPC-enabled attackers, putting them on par strategically with the controllers. While an AMPC-enabled controller is expected to win every game with probability 1, in practice, it is resource-constrained: its maximum prediction horizon and the maximum number of game execution steps are fixed. Under these conditions, an attacker has a much better chance of winning a V-formation game. Our extensive performance evaluation of V-formation games uses statistical model checking to estimate the probability an attacker can thwart the controller. Our results show that for the bird-removal game with R = 1, the controller almost always wins (restores the flock to a V-formation). For R = 2, the game outcome critically depends on which two birds are removed. For the displacement game, our results again demonstrate that an intelligent attacker, i.e. one that uses AMPC in this case, significantly outperforms its naive counterpart that randomly executes its attack.
• Affect Control Theory (ACT) is a powerful and general sociological model of human affective interaction. ACT provides an empirically derived mathematical model of culturally shared sentiments as heuristic guides for human decision making. BayesACT, a variant on classical ACT, combines affective reasoning with cognitive (denotative or logical) reasoning as is traditionally found in AI. Bayes\-ACT allows for the creation of agents that are both emotionally guided and goal-directed. In this work, we simulate BayesACT agents in the Iterated Networked Prisoner's Dilemma (INPD), and we show four out of five known properties of human play in INPD are replicated by these socio-affective agents. In particular, we show how the observed human behaviours of network structure invariance, anti-correlation of cooperation and reward, and player type stratification are all clearly emergent properties of the networked BayesACT agents. We further show that decision hyteresis (Moody Conditional Cooperation) is replicated by BayesACT agents in over $2/3$ of the cases we have considered. In contrast, previously used imitation-based agents are only able to replicate one of the five properties. We discuss the implications of these findings in the development of human-agent societies.
• Jan 30 2017 cs.MA cs.AI arXiv:1701.08125v1
Organic Computing is an initiative in the field of systems engineering that proposed to make use of concepts such as self-adaptation and self-organisation to increase the robustness of technical systems. Based on the observation that traditional design and operation concepts reach their limits, transferring more autonomy to the systems themselves should result in a reduction of complexity for users, administrators, and developers. However, there seems to be a need for an updated definition of the term "Organic Computing", of desired properties of technical, organic systems, and the objectives of the Organic Computing initiative. With this article, we will address these points.
• This paper studies optimal communication and coordination strategies in cyber-physical systems for both defender and attacker within a game-theoretic framework. We model the communication network of a cyber-physical system as a sensor network which involves one single Gaussian source observed by many sensors, subject to additive independent Gaussian observation noises. The sensors communicate with the estimator over a coherent Gaussian multiple access channel. The aim of the receiver is to reconstruct the underlying source with minimum mean squared error. The scenario of interest here is one where some of the sensors are captured by the attacker and they act as the adversary (jammer): they strive to maximize distortion. The receiver (estimator) knows the captured sensors but still cannot simply ignore them due to the multiple access channel, i.e., the outputs of all sensors are summed to generate the estimator input. We show that the ability of transmitter sensors to secretly agree on a random event, that is "coordination", plays a key role in the analysis...
• How to self-localize large teams of underwater nodes using only noisy range measurements? How to do it in a distributed way, and incorporating dynamics into the problem? How to reject outliers and produce trustworthy position estimates? The stringent acoustic communication channel and the accuracy needs of our geophysical survey application demand faster and more accurate localization methods. We approach dynamic localization as a MAP estimation problem where the prior encodes dynamics, and we devise a convex relaxation method that takes advantage of previous estimates at each measurement acquisition step; The algorithm converges at an optimal rate for first order methods. LocDyn is distributed: there is no fusion center responsible for processing acquired data and the same simple computations are performed for each node. LocDyn is accurate: experiments attest to a smaller positioning error than a comparable Kalman filter. LocDyn is robust: it rejects outlier noise, while the comparing methods succumb in terms of positioning error.
• The high mobility of Army tactical networks, combined with their close proximity to hostile actors, elevates the risks associated with short-range network attacks. The connectivity model for such short range connections under active operations is extremely fluid, and highly dependent upon the physical space within which the element is operating, as well as the patterns of movement within that space. To handle these dependencies, we introduce the notion of "key cyber-physical terrain": locations within an area of operations that allow for effective control over the spread of proximity-dependent malware in a mobile tactical network, even as the elements of that network are in constant motion with an unpredictable pattern of node-to-node connectivity. We provide an analysis of movement models and approximation strategies for finding such critical nodes, and demonstrate via simulation that we can identify such key cyber-physical terrain quickly and effectively.
• This paper addresses the problem of synchronizing orthogonal matrices over directed graphs. For synchronized transformations (or matrices), composite transformations over loops equal the identity. We formulate the synchronization problem as a least-squares optimization problem with nonlinear constraints. The synchronization problem appears as one of the key components in applications ranging from 3D-localization to image registration. The main contributions of this work can be summarized as the introduction of two novel algorithms; one for symmetric graphs and one for graphs that are possibly asymmetric. Under general conditions, the former has guaranteed convergence to the solution of a spectral relaxation to the synchronization problem. The latter is stable for small step sizes when the graph is quasi-strongly connected. The proposed methods are verified in numerical simulations.
• In recent years, we have observed a significant trend towards filling the gap between social network analysis and control. This trend was enabled by the introduction of new mathematical models describing dynamics of social groups, the advancement in complex networks theory and multi-agent systems, and the development of modern computational tools for big data analysis. The aim of this tutorial is to highlight a novel chapter of control theory, dealing with applications to social systems, to the attention of the broad research community. This paper is the first part of the tutorial, and it is focused on the most classical models of social dynamics and on their relations to the recent achievements in multi-agent systems.
• A major challenge faced in the design of large-scale cyber-physical systems, such as power systems, the Internet of Things or intelligent transportation systems, is that traditional distributed optimal control methods do not scale gracefully, neither in controller synthesis nor in controller implementation, to systems composed of millions, billions or even trillions of interacting subsystems. This paper shows that this challenge can now be addressed by leveraging the recently introduced System Level Approach (SLA) to controller synthesis. In particular, in the context of the SLA, we define suitable notions of separability for control objective functions and system constraints such that the global optimization problem (or iterate update problems of a distributed optimization algorithm) can be decomposed into parallel subproblems. We then further show that if additional locality (i.e., sparsity) constraints are imposed, then these subproblems can be solved using local models and local decision variables. The SLA is essential to maintaining the convexity of the aforementioned problems under locality constraints. As a consequence, the resulting synthesis methods have $O(1)$ complexity relative to the size of the global system. We further show that many optimal control problems of interest, such as (localized) LQR and LQG, $\mathcal{H}_2$ optimal control with joint actuator and sensor regularization, and (localized) mixed $\mathcal{H}_2/\mathcal{L}_1$ optimal control problems, satisfy these notions of separability, and use these problems to explore tradeoffs in performance, actuator and sensing density, and average versus worst-case performance for a large-scale power inspired system.