# Multiagent Systems (cs.MA)

• In this work we examine recently proposed distance-based classification method designed for near-term quantum processing units with limited resources. We further study possibilities to reduce the quantum resources without any efficiency decrease. We show that only a part of the information undergoes coherent evolution and this fact allows us to introduce an algorithm with significantly reduced quantum memory size. Additionally, considering only partial information at a time, we propose a classification protocol with information distributed among a number of agents. Finally, we show that the information evolution during a measurement can lead to a better solution and that accuracy of the algorithm can be improved by harnessing the state after the final measurement.
• In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.
• The development and deployment of Autonomous Vehicles (AVs) on our roads is not only realistic in the near future but can also bring significant benefits. In particular, it can potentially solve several problems relating to vehicles and traffic, for instance: (i) possible reduction of traffic congestion, with the consequence of improved fuel economy and reduced driver inactivity; (ii) possible reduction in the number of accidents, assuming that an AV can minimise the human errors that often cause traffic accidents; and (iii) increased ease of parking, especially when one considers the potential for shared AVs. In order to deploy an AV there are significant steps that must be completed in terms of hardware and software. As expected, software components play a key role in the complex AV system and so, at least for safety, we should assess the correctness of these components. In this paper, we are concerned with the high-level software component(s) responsible for the decisions in an AV. We intend to model an AV capable of navigation; obstacle avoidance; obstacle selection (when a crash is unavoidable) and vehicle recovery, etc, using a rational agent. To achieve this, we have established the following stages. First, the agent plans and actions have been implemented within the Gwendolen agent programming language. Second, we have built a simulated automotive environment in the Java language. Third, we have formally specified some of the required agent properties through LTL formulae, which are then formally verified with the AJPF verification tool. Finally, within the MCAPL framework (which comprises all the tools used in previous stages) we have obtained formal verification of our AV agent in terms of its specific behaviours. For example, the agent plans responsible for selecting an obstacle with low potential damage, instead of a higher damage obstacle (when possible) can be formally verified within MCAPL. We must emphasise that the major goal (of our present approach) lies in the formal verification of agent plans, rather than evaluating real-world applications. For this reason we utilised a simple matrix representation concerning the environment used by our agent.
• Online matching problems have garnered significant attention in recent years due to numerous applications in e-commerce, online advertisements, ride-sharing, etc. Many of them capture the uncertainty in the real world by including stochasticity in both the arrival process and the matching process. The Online Stochastic Matching with Timeouts problem introduced by Bansal, et al., (Algorithmica, 2012) models matching markets (e.g., E-Bay, Amazon). Buyers arrive from an independent and identically distributed (i.i.d.) known distribution on buyer profiles and can be shown a list of items one at a time. Each buyer has some probability of purchasing each item and a limit (timeout) on the number of items they can be shown. Bansal et al., (Algorithmica, 2012) gave a 0.12-competitive algorithm which was improved by Adamczyk, et al., (ESA, 2015) to 0.24. We present an online attenuation framework that uses an algorithm for offline stochastic matching as a black box. On the upper bound side, we show that this framework, combined with a black-box adapted from Bansal et al., (Algorithmica, 2012), yields an online algorithm which nearly doubles the ratio to 0.46. On the lower bound side, we show that no algorithm can achieve a ratio better than 0.632 using the standard LP for this problem. This framework has a high potential for further improvements since new algorithms for offline stochastic matching can directly improve the ratio for the online problem. Our online framework also has the potential for a variety of extensions. For example, we introduce a natural generalization: Online Stochastic Matching with Two-sided Timeouts in which both online and offline vertices have timeouts. Our framework provides the first algorithm for this problem achieving a ratio of 0.30. We once again use the algorithm of Adamczyk et al., (ESA, 2015) as a black-box and plug-it into our framework.
• Humans interact with each other on a daily basis by developing and maintaining various social norms and it is critical to form a deeper understanding of how such norms develop, how they change, and how fast they change. In this work, we develop an evolutionary game-theoretic model based on research in cultural psychology that shows that humans in various cultures differ in their tendencies to conform with those around them. Using this model, we analyze the evolutionary relationships between the tendency to conform and how quickly a population reacts when conditions make a change in norm desirable. Our analysis identifies conditions when a tipping point is reached in a population, causing norms to change rapidly. To our knowledge, this is the first work analyzing the evolutionary causal relationship between the tendency to conform in a culture with tipping points during norm change.
• Agent-based IoT applications have recently been proposed in several domains, such as health care, smart cities and agriculture. Deploying these applications in specific settings has been very challenging for many reasons including the complex static and dynamic variability of the physical devices such as sensors and actuators, the software application behavior and the environment in which the application is embedded. In this paper, we propose a self-configurable IoT agent approach based on feedback-evaluative machine-learning. The approach involves: i) a variability model of IoT agents; ii) generation of sets of customized agents; iii) feedback evaluative machine learning; iv) modeling and composition of a group of IoT agents; and v) a feature-selection method based on manual and automatic feedback.
• Multi-agent Systems (MASs) have found a variety of industrial applications from economics to robotics, owing to their high adaptability, scalability and applicability. However, with the increasing complexity of MASs, multi-agent control has become a challenging problem to solve. Among different approaches to deal with this complex problem, game theoretic learning recently has received researchers' attention as a possible solution. In such learning scheme, by playing a game, each agent eventually discovers a solution on its own. The main focus of this paper is on enhancement of two types of game-theoretic learning algorithms: log linear learning and reinforcement learning. Each algorithm proposed in this paper, relaxes and imposes different assumptions to fit a class of MAS problems. Numerical experiments are also conducted to verify each algorithm's robustness and performance.
• We discuss the connection between computational social choice (comsoc) and computational complexity. We stress the work so far on, and urge continued focus on, two less-recognized aspects of this connection. Firstly, this is very much a two-way street: Everyone knows complexity classification is used in comsoc, but we also highlight benefits to complexity that have arisen from its use in comsoc. Secondly, more subtle, less-known complexity tools often can be very productively used in comsoc.
• The goal of this paper is to present an end-to-end, data-driven framework to control Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first model the AMoD system using a time-expanded network, and present a formulation that computes the optimal rebalancing strategy (i.e., preemptive repositioning) and the minimum feasible fleet size for a given travel demand. Then, we adapt this formulation to devise a Model Predictive Control (MPC) algorithm that leverages short-term demand forecasts based on historical data to compute rebalancing strategies. We test the end-to-end performance of this controller with a state-of-the-art LSTM neural network to predict customer demand and real customer data from DiDi Chuxing: we show that this approach scales very well for large systems (indeed, the computational complexity of the MPC algorithm does not depend on the number of customers and of vehicles in the system) and outperforms state-of-the-art rebalancing strategies by reducing the mean customer wait time by up to to 89.6%.
• Decentralized control of robots has attracted huge research interests. However, some of the research used unrealistic assumptions without collision avoidance. This report focuses on the collision-free control for multiple robots in both complete coverage and search tasks in 2D and 3D areas which are arbitrary unknown. All algorithms are decentralized as robots have limited abilities and they are mathematically proved. The report starts with the grid selection in the two tasks. Grid patterns simplify the representation of the area and robots only need to move straightly between neighbor vertices. For the 100% complete 2D coverage, the equilateral triangular grid is proposed. For the complete coverage ignoring the boundary effect, the grid with the fewest vertices is calculated in every situation for both 2D and 3D areas. The second part is for the complete coverage in 2D and 3D areas. A decentralized collision-free algorithm with the above selected grid is presented driving robots to sections which are furthest from the reference point. The area can be static or expanding, and the algorithm is simulated in MATLAB. Thirdly, three grid-based decentralized random algorithms with collision avoidance are provided to search targets in 2D or 3D areas. The number of targets can be known or unknown. In the first algorithm, robots choose vacant neighbors randomly with priorities on unvisited ones while the second one adds the repulsive force to disperse robots if they are close. In the third algorithm, if surrounded by visited vertices, the robot will use the breadth-first search algorithm to go to one of the nearest unvisited vertices via the grid. The second search algorithm is verified on Pioneer 3-DX robots. The general way to generate the formula to estimate the search time is demonstrated. Algorithms are compared with five other algorithms in MATLAB to show their effectiveness.
• Due to the complexity of the natural world, a programmer cannot foresee all possible situations a connected and autonomous vehicle (CAV) will face during its operation, and hence, CAVs will need to learn to make decisions autonomously. Due to the sensing of its surroundings and information exchanged with other vehicles and road infrastructure a CAV will have access to large amounts of useful data. While different control algorithms have been proposed for CAVs, the benefits brought about by connectedness of autonomous vehicles to other vehicles and to the infrastructure, and its implications on policy learning has not been investigated in literature. This paper investigates a data driven driving policy learning framework through an agent-based modelling approaches. The contributions of the paper are two-fold. A dynamic programming framework is proposed for in-vehicle policy learning with and without connectivity to neighboring vehicles. The simulation results indicate that while a CAV can learn to make autonomous decisions, vehicle-to-vehicle (V2V) communication of information improves this capability. Furthermore, to overcome the limitations of sensing in a CAV, the paper proposes a novel concept for infrastructure-led policy learning and communication with autonomous vehicles. In infrastructure-led policy learning, road-side infrastructure senses and captures successful vehicle maneuvers and learns an optimal policy from those temporal sequences, and when a vehicle approaches the road-side unit, the policy is communicated to the CAV. Deep-imitation learning methodology is proposed to develop such an infrastructure-led policy learning framework.
• In this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning, and scale the population size up to millions. Our results show that the population dynamics of AI agents, driven only by each agent's individual self interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.
• We introduce an axiomatic approach to group recommendations, in line of previous work on the axiomatic treatment of trust-based recommendation systems, ranking systems, and other foundational work on the axiomatic approach to internet mechanisms in social choice settings. In group recommendations we wish to recommend to a group of agents, consisting of both opinionated and undecided members, a joint choice that would be acceptable to them. Such a system has many applications, such as choosing a movie or a restaurant to go to with a group of friends, recommending games for online game players, & other communal activities. Our method utilizes a given social graph to extract information on the undecided, relying on the agents influencing them. We first show that a set of fairly natural desired requirements (a.k.a axioms) leads to an impossibility, rendering mutual satisfaction of them unreachable. However, we also show a modified set of axioms that fully axiomatize a group variant of the random-walk recommendation system, expanding a previous result from the individual recommendation case.
• We propose a novel Human-Swarm Interaction (HSI) framework which enables the user to control a swarm shape and formation. The user commands the swarm utilizing just arm gestures and motions which are recorded by an off-the-shelf wearable armband. We propose a novel interpreter system, which acts as an intermediary between the user and the swarm to simplify the user's role in the interaction. The interpreter takes in a high level input drawn using gestures by the user, and translates it into low level swarm control commands. This interpreter employs machine learning, Kalman filtering and optimal control techniques to translate the user input into swarm control parameters. A notion of Human Interpretable dynamics is introduced, which is used by the interpreter for planning as well as to provide feedback to the user. The dynamics of the swarm are controlled using a novel decentralized formation controller based on distributed linear iterations and dynamic average consensus. The framework is demonstrated theoretically as well as experimentally in a 2D environment, with a human controlling a swarm of simulated robots in real time.
• Apr 23 2018 cs.AI cs.MA arXiv:1804.07464v1
Delegation allows an agent to request that another agent completes a task. In many situations the task may be delegated onwards, and this process can repeat until it is eventually, successfully or unsuccessfully, performed. We consider policies to guide an agent in choosing who to delegate to when such recursive interactions are possible. These policies, based on quitting games and multi-armed bandits, were empirically tested for effectiveness. Our results indicate that the quitting game based policies outperform those which do not explicitly account for the recursive nature of delegation.
• We present an approach for implementing a specific form of collaborative industrial practices-called Industrial Symbiotic Networks (ISNs)-as MC-Net cooperative games and address the so called ISN implementation problem. This is, the characteristics of ISNs may lead to inapplicability of fair and stable benefit allocation methods even if the collaboration is a collectively desired one. Inspired by realistic ISN scenarios and the literature on normative multi-agent systems, we consider regulations and normative socioeconomic policies as two elements that in combination with ISN games resolve the situation and result in the concept of coordinated ISNs.
• Apr 20 2018 cs.MA cs.AI arXiv:1804.07178v1
Interest in emergent communication has recently surged in Machine Learning. The focus of this interest has largely been either on investigating the properties of the learned protocol or on utilizing emergent communication to better solve problems that already have a viable solution. Here, we consider self-driving cars coordinating with each other and focus on how communication influences the agents' collective behavior. Our main result is that communication helps (most) with adverse conditions.
• Apr 18 2018 cs.MA arXiv:1804.06011v1
Queen Daniela of Sardinia is asleep at the center of a round room at the top of the tower in her castle. She is accompanied by her faithful servant, Eva. Suddenly, they are awakened by cries of "Fire". The room is pitch black and they are disoriented. There is exactly one exit from the room somewhere along its boundary. They must find it as quickly as possible in order to save the life of the queen. It is known that with two people searching while moving at maximum speed 1 anywhere in the room, the room can be evacuated (i.e., with both people exiting) in $1 + \frac{2\pi}{3} + \sqrt{3} \approx 4.8264$ time units and this is optimal~[Czyzowicz et al., DISC'14], assuming that the first person to find the exit can directly guide the other person to the exit using her voice. Somewhat surprisingly, in this paper we show that if the goal is to save the queen (possibly leaving Eva behind to die in the fire) there is a slightly better strategy. We prove that this "priority" version of evacuation can be solved in time at most $4.81854$. Furthermore, we show that any strategy for saving the queen requires time at least $3 + \pi/6 + \sqrt{3}/2 \approx 4.3896$ in the worst case. If one or both of the queen's other servants (Biddy and/or Lili) are with her, we show that the time bounds can be improved to $3.8327$ for two servants, and $3.3738$ for three servants. Finally we show lower bounds for these cases of $3.6307$ (two servants) and $3.2017$ (three servants). The case of $n\geq 4$ is the subject of an independent study by Queen Daniela's Royal Scientific Team.
• In the parable of Simon's Ant, an ant follows a complex path along a beach on to reach its goal. The story shows how the interaction of simple rules and a complex environment result in complex behavior. But this relationship can be looked at in another way - given path and rules, we can infer the environment. With a large population of agents - human or animal - it should be possible to build a detailed map of a population's social and physical environment. In this abstract, we describe the development of a framework to create such maps of human belief space. These maps are built from the combined trajectories of a large number of agents. Currently, these maps are built using multidimensional agent-based simulation, but the framework is designed to work using data from computer-mediated human communication. Maps incorporating human data should support visualization and navigation of the "plains of research", "fashionable foothills" and "conspiracy cliffs" of human belief spaces.
• Distributed controllers are often necessary for a multi-agent system to satisfy safety properties such as collision avoidance. Communication and coordination are key requirements in the implementation of a distributed control protocol, but maintaining an all-to-all communication topology is unreasonable and not always necessary. Given a safety objective and a controller implementation, we consider the problem of identifying when agents need to communicate with one another and coordinate their actions to satisfy the safety constraint. We define a coordination-free controllable predecessor operator that is used to derive a subset of the state space that allows agents to act independently, without consulting other agents to double check that the action is safe. Applications are shown for identifying an upper bound on connection delays and a self-triggered coordination scheme. Examples are provided which showcase the potential for designers to visually interpret a system's ability to tolerate delays when initializing a network connection.
• In this work, we present a programming paradigm allowing the control of swarms with a minimum communication bandwidth in a simple manner, yet allowing the emergence of diverse complex behaviors and autonomy of the swarm. Communication in the proposed paradigm is based on single bit "ping"-signals propagating as information-waves throughout the swarm. We show that even this minimum bandwidth communication between agents suffices for the design of a substantial set of behaviors in the domain of essential behaviors of a collective, including locomotion and self awareness of the swarm.
• The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.
• Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textita priori ungrounded and is a form of cheap talk. We show that self-interested agents can use the pre-grounded communication channel to negotiate fairly, but are unable to effectively use the ungrounded channel. However, prosocial agents do learn to use cheap talk to find an optimal negotiating strategy, suggesting that cooperation is necessary for language to emerge. We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.
• We present a novel algorithm for computing collision-free navigation for heterogeneous road-agents such as cars, tricycles, bicycles, and pedestrians in dense traffic. Our approach currently assumes the positions, shapes, and velocities of all vehicles and pedestrians are known and computes smooth trajectories for each agent by taking into account the dynamic constraints. We describe an efficient optimization-based algorithm for each road-agent based on reciprocal velocity obstacles that takes into account kinematic and dynamic constraints. Our algorithm uses tight fitting shape representations based on medial axis to compute collision-free trajectories in dense traffic situations. We evaluate the performance of our algorithm in real-world dense traffic scenarios and highlight the benefits over prior reciprocal collision avoidance schemes.
• Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDEC-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDEC-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real-world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.
• The social community in open source software developers has a complex network structure. The network structure represents the relations between the project and the engineer in the software developer's community. A project forms some teams which consist of engineers categorized into some task group. Source Forge is well known to be one of open source websites. The node and arc in the network structure means the engineer and their connection among engineers in the Source Forge. In the previous study, we found the growing process of project becomes strong according to the number of developers joining into the project. In the growing phase, we found some characteristic patterns between the number of agents and the produced projects. By such observations, we developed a simulation model of performing the growing process of project. In this paper, we introduced the altruism behavior as shown in the Army Ant model into the software developer's simulation model. The efficiency of the software developing process was investigated by some experimental simulation results.
• We present a novel algorithm for reciprocal collision avoidance between heterogeneous agents of different shapes and sizes. We present a novel CTMAT representation based on medial axis transform to compute a tight fitting bounding shape for each agent. Each CTMAT is represented using tuples, which are composed of circular arcs and line segments. Based on the reciprocal velocity obstacle formulation, we reduce the problem to solving a low-dimensional linear programming between each pair of tuples belonging to adjacent agents. We precompute the Minkowski Sums of tuples to accelerate the runtime performance. Finally, we provide an efficient method to update the orientation of each agent in a local manner. We have implemented the algorithm and highlight its performance on benchmarks corresponding to road traffic scenarios and different vehicles. The overall runtime performance is comparable to prior multi-agent collision avoidance algorithms that use circular or elliptical agents. Our approach is less conservative and results in fewer false collisions.
• This paper describes an agent based simulation used to model human actions in belief space, a high-dimensional subset of information space associated with opinions. Using insights from animal collective behavior, we are able to simulate and identify behavior patterns that are similar to nomadic, flocking and stampeding patterns of animal groups. These behaviors have analogous manifestations in human interaction, emerging as solitary explorers, the fashion-conscious, and members of polarized echo chambers. We demonstrate that a small portion of nomadic agents that widely traverse belief space can disrupt a larger population of stampeding agents. Extending the model, we introduce the concept of Adversarial Herding, where bad actors can exploit properties of technologically mediated communication to artificially create self sustaining runaway polarization. We call this condition the Pishkin Effect as it recalls the large scale buffalo stampedes that could be created by native Americans hunters. We then discuss opportunities for system design that could leverage the ability to recognize these negative patterns, and discuss affordances that may disrupt the formation of natural and deliberate echo chambers.
• Self-organization has been an important concept within a number of disciplines, which Artificial Life (ALife) also has heavily utilized since its inception. The term and its implications, however, are often confusing or misinterpreted. In this work, we provide a mini-review of self-organization and its relationship with ALife, aiming at initiating discussions on this important topic with the interested audience. We first articulate some fundamental aspects of self-organization, outline its usage, and review its applications to ALife within its soft, hard, and wet domains. We also provide perspectives for further research.
• Real-time strategy games have been an important field of game artificial intelligence in recent years. This paper presents a reinforcement learning and curriculum transfer learning method to control multiple units in StarCraft micromanagement. We define an efficient state representation, which breaks down the complexity caused by the large state space in the game environment. Then a parameter sharing multi-agent gradientdescent Sarsa(\lambda) (PS-MAGDS) algorithm is proposed to train the units. The learning policy is shared among our units to encourage cooperative behaviors. We use a neural network as a function approximator to estimate the action-value function, and propose a reward function to help units balance their move and attack. In addition, a transfer learning method is used to extend our model to more difficult scenarios, which accelerates the training process and improves the learning performance. In small scale scenarios, our units successfully learn to combat and defeat the built-in AI with 100% win rates. In large scale scenarios, curriculum transfer learning method is used to progressively train a group of units, and shows superior performance over some baseline methods in target scenarios. With reinforcement learning and curriculum transfer learning, our units are able to learn appropriate strategies in StarCraft micromanagement scenarios.
• The authors present an overview of a hierarchical framework for coordinating task- and motion-level operations in multirobot systems. Their framework is based on the idea of using simple temporal networks to simultaneously reason about precedence/causal constraints required for task-level coordination and simple temporal constraints required to take some kinematic constraints of robots into account. In the plan-generation phase, the framework provides a computationally scalable method for generating plans that achieve high-level tasks for groups of robots and take some of their kinematic constraints into account. In the plan-execution phase, the framework provides a method for absorbing an imperfect plan execution to avoid time-consuming re-planning in many cases. The authors use the multirobot path-planning problem as a case study to present the key ideas behind their framework for the long-term autonomy of multirobot systems.
• In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.
• Apr 02 2018 cs.MA cs.CY arXiv:1803.11457v1
The emerging field of morphogenetic engineering proposes to design complex heterogeneous system focused on the paradigm of emergence. Necessarily at the interface of disciplines, its concepts can be defined through multiple viewpoints. This contribution aims at linking a co-evolutionary perspective on such systems with morphogenesis, and therein at bringing a novel conceptual approach to the bottom-up design of complex systems which allows to fully consider co-evolutive processes. We first situate systems of interest at the interface between biological and social systems, and introduce a multidisciplinary perspective on co-evolution. Building on Holland's signals and boundaries theory of complex adaptive systems, we finally suggest that morphogenetic systems are equivalent to combinations of co-evolutionary niches. This introduces an entry to morphogenetic engineering focused on co-evolution between components of a system. Applications can be found in a broad range of subjects, which we illustrate with the example of planning in territorial systems, suggesting an extended scope for the relevance of morphogenetic engineering concepts.
• This paper develops an optimal relative output-feedback based solution to the containment control problem of linear heterogeneous multi-agent systems. A distributed optimal control protocol is presented for the followers to not only assure that their outputs fall into the convex hull of the leaders' output (i.e., the desired or safe region), but also optimizes their transient performance. The proposed optimal control solution is composed of a feedback part, depending of the followers' state, and a feed-forward part, depending on the convex hull of the leaders' state. To comply with most real-world applications, the feedback and feed-forward states are assumed to be unavailable and are estimated using two distributed observers. That is, since the followers cannot directly sense their absolute states, a distributed observer is designed that uses only relative output measurements with respect to their neighbors (measured for example by using range sensors in robotic) and the information which is broadcasted by their neighbors to estimate their states. Moreover, another adaptive distributed observer is designed that uses exchange of information between followers over a communication network to estimate the convex hull of the leaders' state. The proposed observer relaxes the restrictive requirement of knowing the complete knowledge of the leaders' dynamics by all followers. An off-policy reinforcement learning algorithm on an actor-critic structure is next developed to solve the optimal containment control problem online, using relative output measurements and without requirement of knowing the leaders' dynamics by all followers. Finally, the theoretical results are verified by numerical simulations.
• The amount of personal data collected in our everyday interactions with connected devices offers great opportunities for innovative services fueled by machine learning, as well as raises serious concerns for the privacy of individuals. In this paper, we propose a massively distributed protocol for a large set of users to privately compute averages over their joint data, which can then be used to learn predictive models. Our protocol can find a solution of arbitrary accuracy, does not rely on a third party and preserves the privacy of users throughout the execution in both the honest-but-curious and malicious adversary models. Specifically, we prove that the information observed by the adversary (the set of maliciours users) does not significantly reduce the uncertainty in its prediction of private values compared to its prior belief. The level of privacy protection depends on a quantity related to the Laplacian matrix of the network graph and generally improves with the size of the graph. Furthermore, we design a verification procedure which offers protection against malicious users joining the service with the goal of manipulating the outcome of the algorithm.
• Applications in robotics, such as multi-robot target tracking, involve the execution of information acquisition tasks by teams of mobile robots. However, in failure-prone or adversarial environments, robots get attacked, their communication channels get jammed, and their sensors fail, resulting in the withdrawal of robots from the collective task, and, subsequently, the inability of the remaining active robots to coordinate with each other. As a result, traditional design paradigms become insufficient and, in contrast, resilient designs against system-wide failures and attacks become important. In general, resilient design problems are hard, and even though they often involve objective functions that are monotone and (possibly) submodular, scalable approximation algorithms for their solution have been hitherto unknown. In this paper, we provide the first algorithm, enabling the following capabilities: minimal communication, i.e., the algorithm is executed by the robots based only on minimal communication between them, system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks and failures, and provable approximation performance, i.e., the algorithm ensures for all monotone and (possibly) submodular objective functions a solution that is finitely close to the optimal. We support our theoretical analyses with simulated and real-world experiments, by considering an active information acquisition application scenario, namely, multi-robot target tracking.
• Mar 28 2018 cs.MA cs.SY math.OC arXiv:1803.08950v1
We consider a multi-agent framework for distributed optimization where each agent in the network has access to a local convex function and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents' local functions. We propose an algorithm wherein each agent operates asynchronously and independently of the other agents in the network. When the local functions are strongly-convex with Lipschitz-continuous gradients, we show that a subsequence of the iterates at each agent converges to a neighbourhood of the global minimum, where the size of the neighbourhood depends on the degree of asynchrony in the multi-agent network. When the agents work at the same rate, convergence to the global minimizer is achieved. Numerical experiments demonstrate that Asynchronous Subgradient-Push can minimize the global objective faster than state-of-the-art synchronous first-order methods, is more robust to failing or stalling agents, and scales better with the network size.
• In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method. We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant step size choice). More importantly, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size, which is a comparable performance to a centralized stochastic gradient algorithm. Numerical examples further demonstrate the effectiveness of the method.
• When scheduling public works or events in a shared facility one needs to accommodate preferences of a population. We formalize this problem by introducing the notion of a collective schedule. We show how to extend fundamental tools from social choice theory---positional scoring rules, the Kemeny rule and the Condorcet principle---to collective scheduling. We study the computational complexity of finding collective schedules. We also experimentally demonstrate that optimal collective schedules can be found for instances with realistic sizes.
• In this paper, we review multi-agent collective behavior algorithms in the literature and classify them according to their underlying mathematical structure. For each mathematical technique, we identify the multi-agent coordination tasks it can be applied to, and we analyze its scalability, bandwidth use, and demonstrated maturity. We highlight how versatile techniques such as artificial potential functions can be used for applications ranging from low-level position control to high-level coordination and task allocation, we discuss possible reasons for the slow adoption of complex distributed coordination algorithms in the field, and we highlight areas for further research and development.
• Although multi-agent reinforcement learning can tackle systems of strategically interacting entities, it currently fails in scalability and lacks rigorous convergence guarantees. Crucially, learning in multi-agent systems can become intractable due to the explosion in the size of the state-action space as the number of agents increases. In this paper, we propose a method for computing closed-loop optimal policies in multi-agent systems that scales independently of the number of agents. This allows us to show, for the first time, successful convergence to optimal behaviour in systems with an unbounded number of interacting adaptive learners. Studying the asymptotic regime of N-player stochastic games, we devise a learning protocol that is guaranteed to converge to equilibrium policies even when the number of agents is extremely large. Our method is model-free and completely decentralised so that each agent need only observe its local state information and its realised rewards. We validate these theoretical results by showing convergence to Nash-equilibrium policies in applications from economics and control theory with thousands of strategically interacting agents.
• Residential location choice modeling is one of the substantial components of land use and transportation models. While numerous aggregated mathematical and statistical approaches have been developed to model the residence choice behavior of households, disaggregated approaches such as the agent-based modeling have shown interesting capabilities. In this article, a novel agent-based approach is developed to simulate the residential location choice of tenants in Tehran, the capital of Iran. Tenants are considered as agents who select their desired residential alternatives according to their characteristics and preferences for various criteria such as the rent, accessibility to different services and facilities, environmental pollution, and distance from their workplace and former residence. The choice set of agents is limited to their desired residential alternatives by applying a constrained NSGA-II algorithm. Then, agents compete with each other to select their final residence among their alternatives. Results of the proposed approach are validated by comparing simulated and actual residences of a sample of tenants. Results show that the proposed approach is able to accurately simulate the residence of 59.3% of tenants at the traffic analysis zone level.
• Increasing energy efficiency in buildings can reduce costs and emissions substantially. Historically, this has been treated as a local, or single-agent, optimization problem. However, many buildings utilize the same types of thermal equipment e.g. electric heaters and hot water vessels. During operation, occupants in these buildings interact with the equipment differently thereby driving them to diverse regions in the state-space. Reinforcement learning agents can learn from these interactions, recorded as sensor data, to optimize the overall energy efficiency. However, if these agents operate individually at a household level, they can not exploit the replicated structure in the problem. In this paper, we demonstrate that this problem can indeed benefit from multi-agent collaboration by making use of targeted exploration of the state-space allowing for better generalization. We also investigate trade-offs between integrating human knowledge and additional sensors. Results show that savings of over 40% are possible with collaborative multi-agent systems making use of either expert knowledge or additional sensors with no loss of occupant comfort. We find that such multi-agent systems comfortably outperform comparable single agent systems.
• Recent work from the reinforcement learning community has shown that Evolution Strategies are a fast and scalable alternative to other reinforcement learning methods. In this paper we show that Evolution Strategies are a special case of model-based stochastic search methods. This class of algorithms has nice asymptotic convergence properties and known convergence rates. We show how these methods can be used to solve both cooperative and competitive multi-agent problems in an efficient manner. We demonstrate the effectiveness of this approach on two complex multi-agent UAV swarm combat scenarios: where a team of fixed wing aircraft must attack a well-defended base, and where two teams of agents go head to head to defeat each other.
• The Iterated Prisoner's Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoner's dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoner's Dilemma (SPD) game to better capture the aforementioned characteristics. In this work, we propose a deep multiagent reinforcement learning approach that investigates the evolution of mutual cooperation in SPD games. Our approach consists of two phases. The first phase is offline: it synthesizes policies with different cooperation degrees and then trains a cooperation degree detection network. The second phase is online: an agent adaptively selects its policy based on the detected degree of opponent cooperation. The effectiveness of our approach is demonstrated in two representative SPD 2D games: the Apple-Pear game and the Fruit Gathering game. Experimental results show that our strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents.
• In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system's surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.
• The impression of free will is the feeling according to which our choices are neither imposed from our inside nor from outside. It is the sense we are the ultimate cause of our acts. In direct opposition with the universal determinism, the existence of free will continues to be discussed. In this paper, free will is linked to a decisional mechanism: an agent is provided with free will if having performed a predictable choice Cp, it can immediately perform another choice Cr in a random way. The intangible feeling of free will is replaced by a decision-making process including a predictable decision-making process immediately followed by an unpredictable decisional one.
• We consider the problem of \emphfully decentralized multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large. Under the decentralized structure, the actor step is performed individually by each agent with no need to infer the policies of others. For the critic step, we propose a consensus update via communication over the network. Our algorithms are fully incremental and can be implemented in an online fashion. Convergence analyses of the algorithms are provided when the value functions are approximated within the class of linear functions. Extensive simulation results with both linear and nonlinear function approximations are presented to validate the proposed algorithms. Our work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees.
• Recently, multiagent deep reinforcement learning (DRL) has received increasingly wide attention. Existing multiagent DRL algorithms are inefficient when facing with the non-stationarity due to agents update their policies simultaneously in stochastic cooperative environments. This paper extends the recently proposed weighted double estimator to the multiagent domain and propose a multiagent DRL framework, named weighted double deep Q-network (WDDQN). By utilizing the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also be extended to scenarios with raw visual inputs. To achieve efficient cooperation in the multiagent domain, we introduce the lenient reward network and the scheduled replay strategy. Experiments show that the WDDQN outperforms the existing DRL and multiaent DRL algorithms, i.e., double DQN and lenient Q-learning, in terms of the average reward and the convergence rate in stochastic cooperative environments.
• Linear-Quadratic-Gaussian (LQG) control is concerned with the design of an optimal controller and estimator for linear Gaussian systems with imperfect state information. Standard LQG control assumes the set of sensor measurements to be fed to the estimator to be given. However, in many problems in networked systems and robotics, one is interested in designing a suitable set of sensors for LQG control. In this paper, we introduce the LQG control and sensing co-design problem, where one has to jointly design a suitable sensing, estimation, and control policy. We consider two dual instances of the co-design problem: the sensing-constrained LQG control problem, where the design maximizes the control performance subject to sensing constraints, and the minimum-sensing LQG control, where the design minimizes the amount of sensing subject to performance constraints. We focus on the case in which the sensing design has to be selected among a finite set of possible sensing modalities, where each modality is associated with a different cost. While we observe that the computation of the optimal sensing design is intractable in general, we present the first scalable LQG co-design algorithms to compute near-optimal policies with provable sub-optimality guarantees. To this end, (i) we show that a separation principle holds, which partially decouples the design of sensing, estimation, and control; (ii) we frame LQG co-design as the optimization of (approximately) supermodular set functions; (iii) we develop novel algorithms to solve the resulting optimization problems; (iv) we prove original results on the performance of these algorithms and establish connections between their sub-optimality gap and control-theoretic quantities. We conclude the paper by discussing two practical applications of the co-design problem, namely, sensing-constrained formation control and resource-constrained robot navigation.