- We present an approach for implementing a specific form of collaborative industrial practices-called Industrial Symbiotic Networks (ISNs)-as MC-Net cooperative games and address the so called ISN implementation problem. This is, the characteristics of ISNs may lead to inapplicability of fair and stable benefit allocation methods even if the collaboration is a collectively desired one. Inspired by realistic ISN scenarios and the literature on normative multi-agent systems, we consider regulations and normative socioeconomic policies as two elements that in combination with ISN games resolve the situation and result in the concept of coordinated ISNs.
- Interest in emergent communication has recently surged in Machine Learning. The focus of this interest has largely been either on investigating the properties of the learned protocol or on utilizing emergent communication to better solve problems that already have a viable solution. Here, we consider self-driving cars coordinating with each other and focus on how communication influences the agents' collective behavior. Our main result is that communication helps (most) with adverse conditions.
- Apr 18 2018 cs.MA arXiv:1804.06011v1Queen Daniela of Sardinia is asleep at the center of a round room at the top of the tower in her castle. She is accompanied by her faithful servant, Eva. Suddenly, they are awakened by cries of "Fire". The room is pitch black and they are disoriented. There is exactly one exit from the room somewhere along its boundary. They must find it as quickly as possible in order to save the life of the queen. It is known that with two people searching while moving at maximum speed 1 anywhere in the room, the room can be evacuated (i.e., with both people exiting) in $1 + \frac{2\pi}{3} + \sqrt{3} \approx 4.8264$ time units and this is optimal~[Czyzowicz et al., DISC'14], assuming that the first person to find the exit can directly guide the other person to the exit using her voice. Somewhat surprisingly, in this paper we show that if the goal is to save the queen (possibly leaving Eva behind to die in the fire) there is a slightly better strategy. We prove that this "priority" version of evacuation can be solved in time at most $4.81854$. Furthermore, we show that any strategy for saving the queen requires time at least $3 + \pi/6 + \sqrt{3}/2 \approx 4.3896$ in the worst case. If one or both of the queen's other servants (Biddy and/or Lili) are with her, we show that the time bounds can be improved to $3.8327$ for two servants, and $3.3738$ for three servants. Finally we show lower bounds for these cases of $3.6307$ (two servants) and $3.2017$ (three servants). The case of $n\geq 4$ is the subject of an independent study by Queen Daniela's Royal Scientific Team.
- In the parable of Simon's Ant, an ant follows a complex path along a beach on to reach its goal. The story shows how the interaction of simple rules and a complex environment result in complex behavior. But this relationship can be looked at in another way - given path and rules, we can infer the environment. With a large population of agents - human or animal - it should be possible to build a detailed map of a population's social and physical environment. In this abstract, we describe the development of a framework to create such maps of human belief space. These maps are built from the combined trajectories of a large number of agents. Currently, these maps are built using multidimensional agent-based simulation, but the framework is designed to work using data from computer-mediated human communication. Maps incorporating human data should support visualization and navigation of the "plains of research", "fashionable foothills" and "conspiracy cliffs" of human belief spaces.
- Distributed controllers are often necessary for a multi-agent system to satisfy safety properties such as collision avoidance. Communication and coordination are key requirements in the implementation of a distributed control protocol, but maintaining an all-to-all communication topology is unreasonable and not always necessary. Given a safety objective and a controller implementation, we consider the problem of identifying when agents need to communicate with one another and coordinate their actions to satisfy the safety constraint. We define a coordination-free controllable predecessor operator that is used to derive a subset of the state space that allows agents to act independently, without consulting other agents to double check that the action is safe. Applications are shown for identifying an upper bound on connection delays and a self-triggered coordination scheme. Examples are provided which showcase the potential for designers to visually interpret a system's ability to tolerate delays when initializing a network connection.
- Apr 13 2018 cs.MA arXiv:1804.04202v1In this work, we present a programming paradigm allowing the control of swarms with a minimum communication bandwidth in a simple manner, yet allowing the emergence of diverse complex behaviors and autonomy of the swarm. Communication in the proposed paradigm is based on single bit "ping"-signals propagating as information-waves throughout the swarm. We show that even this minimum bandwidth communication between agents suffices for the design of a substantial set of behaviors in the domain of essential behaviors of a collective, including locomotion and self awareness of the swarm.
- The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured.
- Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textita priori ungrounded and is a form of cheap talk. We show that self-interested agents can use the pre-grounded communication channel to negotiate fairly, but are unable to effectively use the ungrounded channel. However, prosocial agents do learn to use cheap talk to find an optimal negotiating strategy, suggesting that cooperation is necessary for language to emerge. We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.
- We present a novel algorithm for computing collision-free navigation for heterogeneous road-agents such as cars, tricycles, bicycles, and pedestrians in dense traffic. Our approach currently assumes the positions, shapes, and velocities of all vehicles and pedestrians are known and computes smooth trajectories for each agent by taking into account the dynamic constraints. We describe an efficient optimization-based algorithm for each road-agent based on reciprocal velocity obstacles that takes into account kinematic and dynamic constraints. Our algorithm uses tight fitting shape representations based on medial axis to compute collision-free trajectories in dense traffic situations. We evaluate the performance of our algorithm in real-world dense traffic scenarios and highlight the benefits over prior reciprocal collision avoidance schemes.
- Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDEC-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDEC-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real-world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.
- The social community in open source software developers has a complex network structure. The network structure represents the relations between the project and the engineer in the software developer's community. A project forms some teams which consist of engineers categorized into some task group. Source Forge is well known to be one of open source websites. The node and arc in the network structure means the engineer and their connection among engineers in the Source Forge. In the previous study, we found the growing process of project becomes strong according to the number of developers joining into the project. In the growing phase, we found some characteristic patterns between the number of agents and the produced projects. By such observations, we developed a simulation model of performing the growing process of project. In this paper, we introduced the altruism behavior as shown in the Army Ant model into the software developer's simulation model. The efficiency of the software developing process was investigated by some experimental simulation results.
- We present a novel algorithm for reciprocal collision avoidance between heterogeneous agents of different shapes and sizes. We present a novel CTMAT representation based on medial axis transform to compute a tight fitting bounding shape for each agent. Each CTMAT is represented using tuples, which are composed of circular arcs and line segments. Based on the reciprocal velocity obstacle formulation, we reduce the problem to solving a low-dimensional linear programming between each pair of tuples belonging to adjacent agents. We precompute the Minkowski Sums of tuples to accelerate the runtime performance. Finally, we provide an efficient method to update the orientation of each agent in a local manner. We have implemented the algorithm and highlight its performance on benchmarks corresponding to road traffic scenarios and different vehicles. The overall runtime performance is comparable to prior multi-agent collision avoidance algorithms that use circular or elliptical agents. Our approach is less conservative and results in fewer false collisions.
- This paper describes an agent based simulation used to model human actions in belief space, a high-dimensional subset of information space associated with opinions. Using insights from animal collective behavior, we are able to simulate and identify behavior patterns that are similar to nomadic, flocking and stampeding patterns of animal groups. These behaviors have analogous manifestations in human interaction, emerging as solitary explorers, the fashion-conscious, and members of polarized echo chambers. We demonstrate that a small portion of nomadic agents that widely traverse belief space can disrupt a larger population of stampeding agents. Extending the model, we introduce the concept of Adversarial Herding, where bad actors can exploit properties of technologically mediated communication to artificially create self sustaining runaway polarization. We call this condition the Pishkin Effect as it recalls the large scale buffalo stampedes that could be created by native Americans hunters. We then discuss opportunities for system design that could leverage the ability to recognize these negative patterns, and discuss affordances that may disrupt the formation of natural and deliberate echo chambers.
- Self-organization has been an important concept within a number of disciplines, which Artificial Life (ALife) also has heavily utilized since its inception. The term and its implications, however, are often confusing or misinterpreted. In this work, we provide a mini-review of self-organization and its relationship with ALife, aiming at initiating discussions on this important topic with the interested audience. We first articulate some fundamental aspects of self-organization, outline its usage, and review its applications to ALife within its soft, hard, and wet domains. We also provide perspectives for further research.
- Real-time strategy games have been an important field of game artificial intelligence in recent years. This paper presents a reinforcement learning and curriculum transfer learning method to control multiple units in StarCraft micromanagement. We define an efficient state representation, which breaks down the complexity caused by the large state space in the game environment. Then a parameter sharing multi-agent gradientdescent Sarsa(\lambda) (PS-MAGDS) algorithm is proposed to train the units. The learning policy is shared among our units to encourage cooperative behaviors. We use a neural network as a function approximator to estimate the action-value function, and propose a reward function to help units balance their move and attack. In addition, a transfer learning method is used to extend our model to more difficult scenarios, which accelerates the training process and improves the learning performance. In small scale scenarios, our units successfully learn to combat and defeat the built-in AI with 100% win rates. In large scale scenarios, curriculum transfer learning method is used to progressively train a group of units, and shows superior performance over some baseline methods in target scenarios. With reinforcement learning and curriculum transfer learning, our units are able to learn appropriate strategies in StarCraft micromanagement scenarios.
- The authors present an overview of a hierarchical framework for coordinating task- and motion-level operations in multirobot systems. Their framework is based on the idea of using simple temporal networks to simultaneously reason about precedence/causal constraints required for task-level coordination and simple temporal constraints required to take some kinematic constraints of robots into account. In the plan-generation phase, the framework provides a computationally scalable method for generating plans that achieve high-level tasks for groups of robots and take some of their kinematic constraints into account. In the plan-execution phase, the framework provides a method for absorbing an imperfect plan execution to avoid time-consuming re-planning in many cases. The authors use the multirobot path-planning problem as a case study to present the key ideas behind their framework for the long-term autonomy of multirobot systems.
- In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.
- The emerging field of morphogenetic engineering proposes to design complex heterogeneous system focused on the paradigm of emergence. Necessarily at the interface of disciplines, its concepts can be defined through multiple viewpoints. This contribution aims at linking a co-evolutionary perspective on such systems with morphogenesis, and therein at bringing a novel conceptual approach to the bottom-up design of complex systems which allows to fully consider co-evolutive processes. We first situate systems of interest at the interface between biological and social systems, and introduce a multidisciplinary perspective on co-evolution. Building on Holland's signals and boundaries theory of complex adaptive systems, we finally suggest that morphogenetic systems are equivalent to combinations of co-evolutionary niches. This introduces an entry to morphogenetic engineering focused on co-evolution between components of a system. Applications can be found in a broad range of subjects, which we illustrate with the example of planning in territorial systems, suggesting an extended scope for the relevance of morphogenetic engineering concepts.
- This paper develops an optimal relative output-feedback based solution to the containment control problem of linear heterogeneous multi-agent systems. A distributed optimal control protocol is presented for the followers to not only assure that their outputs fall into the convex hull of the leaders' output (i.e., the desired or safe region), but also optimizes their transient performance. The proposed optimal control solution is composed of a feedback part, depending of the followers' state, and a feed-forward part, depending on the convex hull of the leaders' state. To comply with most real-world applications, the feedback and feed-forward states are assumed to be unavailable and are estimated using two distributed observers. That is, since the followers cannot directly sense their absolute states, a distributed observer is designed that uses only relative output measurements with respect to their neighbors (measured for example by using range sensors in robotic) and the information which is broadcasted by their neighbors to estimate their states. Moreover, another adaptive distributed observer is designed that uses exchange of information between followers over a communication network to estimate the convex hull of the leaders' state. The proposed observer relaxes the restrictive requirement of knowing the complete knowledge of the leaders' dynamics by all followers. An off-policy reinforcement learning algorithm on an actor-critic structure is next developed to solve the optimal containment control problem online, using relative output measurements and without requirement of knowing the leaders' dynamics by all followers. Finally, the theoretical results are verified by numerical simulations.
- The amount of personal data collected in our everyday interactions with connected devices offers great opportunities for innovative services fueled by machine learning, as well as raises serious concerns for the privacy of individuals. In this paper, we propose a massively distributed protocol for a large set of users to privately compute averages over their joint data, which can then be used to learn predictive models. Our protocol can find a solution of arbitrary accuracy, does not rely on a third party and preserves the privacy of users throughout the execution in both the honest-but-curious and malicious adversary models. Specifically, we prove that the information observed by the adversary (the set of maliciours users) does not significantly reduce the uncertainty in its prediction of private values compared to its prior belief. The level of privacy protection depends on a quantity related to the Laplacian matrix of the network graph and generally improves with the size of the graph. Furthermore, we design a verification procedure which offers protection against malicious users joining the service with the goal of manipulating the outcome of the algorithm.
- Applications in robotics, such as multi-robot target tracking, involve the execution of information acquisition tasks by teams of mobile robots. However, in failure-prone or adversarial environments, robots get attacked, their communication channels get jammed, and their sensors fail, resulting in the withdrawal of robots from the collective task, and, subsequently, the inability of the remaining active robots to coordinate with each other. As a result, traditional design paradigms become insufficient and, in contrast, resilient designs against system-wide failures and attacks become important. In general, resilient design problems are hard, and even though they often involve objective functions that are monotone and (possibly) submodular, scalable approximation algorithms for their solution have been hitherto unknown. In this paper, we provide the first algorithm, enabling the following capabilities: minimal communication, i.e., the algorithm is executed by the robots based only on minimal communication between them, system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks and failures, and provable approximation performance, i.e., the algorithm ensures for all monotone and (possibly) submodular objective functions a solution that is finitely close to the optimal. We support our theoretical analyses with simulated and real-world experiments, by considering an active information acquisition application scenario, namely, multi-robot target tracking.
- We consider a multi-agent framework for distributed optimization where each agent in the network has access to a local convex function and the collective goal is to achieve consensus on the parameters that minimize the sum of the agents' local functions. We propose an algorithm wherein each agent operates asynchronously and independently of the other agents in the network. When the local functions are strongly-convex with Lipschitz-continuous gradients, we show that a subsequence of the iterates at each agent converges to a neighbourhood of the global minimum, where the size of the neighbourhood depends on the degree of asynchrony in the multi-agent network. When the agents work at the same rate, convergence to the global minimizer is achieved. Numerical experiments demonstrate that Asynchronous Subgradient-Push can minimize the global objective faster than state-of-the-art synchronous first-order methods, is more robust to failing or stalling agents, and scales better with the network size.
- In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method. We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant step size choice). More importantly, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size, which is a comparable performance to a centralized stochastic gradient algorithm. Numerical examples further demonstrate the effectiveness of the method.
- When scheduling public works or events in a shared facility one needs to accommodate preferences of a population. We formalize this problem by introducing the notion of a collective schedule. We show how to extend fundamental tools from social choice theory---positional scoring rules, the Kemeny rule and the Condorcet principle---to collective scheduling. We study the computational complexity of finding collective schedules. We also experimentally demonstrate that optimal collective schedules can be found for instances with realistic sizes.
- Apr 18 2018 cs.MA arXiv:1804.06311v1
- Apr 16 2018 cs.MA arXiv:1804.04746v1
- Mar 26 2018 cs.MA physics.soc-ph arXiv:1803.08867v1