The existence of a coalition strategy to achieve a goal does not necessarily mean that the coalition has enough information to know how to follow the strategy. Neither does it mean that the coalition knows that such a strategy exists. The article studies an interplay between the distributed knowledge, coalition strategies, and coalition "know-how" strategies. The main technical result is a sound and complete trimodal logical system that describes the properties of this interplay.
Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.
This paper explores the use of Answer Set Programming (ASP) in solving Distributed Constraint Optimization Problems (DCOPs). The paper provides the following novel contributions: (1) It shows how one can formulate DCOPs as logic programs; (2) It introduces ASP-DPOP, the first DCOP algorithm that is based on logic programming; (3) It experimentally shows that ASP-DPOP can be up to two orders of magnitude faster than DPOP (its imperative programming counterpart) as well as solve some problems that DPOP fails to solve, due to memory limitations; and (4) It demonstrates the applicability of ASP in a wide array of multi-agent problems currently modeled as DCOPs. Under consideration in Theory and Practice of Logic Programming (TPLP).
In this paper we describe a simulation platform that supports studies on the impact of crime on urban mobility. We present an example of how this can be achieved by seeking to understand the effect, on the transport system, if users of this system decide to choose optimal routes of time between origins and destinations that normally follow. Based on real data from a large Brazilian metropolis, we found that the percentage of users who follow this policy is small. Most prefer to follow less efficient routes by making bus exchanges at terminals. This can be understood as an indication that the users of the transport system privilege the security factor.
This paper explores the Coevolutionary Optional Prisoner's Dilemma (COPD) game, which is a simple model to coevolve game strategy and link weights of agents playing the Optional Prisoner's Dilemma game. We consider a population of agents placed in a lattice grid with boundary conditions. A number of Monte Carlo simulations are performed to investigate the impacts of the COPD game on the emergence of cooperation. Results show that the coevolutionary rules enable cooperators to survive and even dominate, with the presence of abstainers in the population playing a key role in the protection of cooperators against exploitation from defectors. We observe that in adverse conditions such as when the initial population of abstainers is too scarce/abundant, or when the temptation to defect is very high, cooperation has no chance of emerging. However, when the simple coevolutionary rules are applied, cooperators flourish.
The computability power of a distributed computing model is determined by the communication media available to the processes, the timing assumptions about processes and communication, and the nature of failures that processes can suffer. In a companion paper we showed how dynamic epistemic logic can be used to give a formal semantics to a given distributed computing model, to capture precisely the knowledge needed to solve a distributed task, such as consensus. Furthermore, by moving to a dual model of epistemic logic defined by simplicial complexes, topological invariants are exposed, which determine task solvability. In this paper we show how to extend the setting above to include in the knowledge of the processes, knowledge about the model of computation itself. The extension describes the knowledge processes gain about the current execution, in problems where processes have no input values at all.
The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present. These algorithms are used when the risk functions are non-smooth and involve non-differentiable components. They have been long recognized as being slow converging methods. However, it was revealed in Part I that the rate of convergence becomes linear for stochastic optimization problems, with the error iterate converging at an exponential rate $\alpha^i$ to within an $O(\mu)-$neighborhood of the optimizer, for some $\alpha \in (0,1)$ and small step-size $\mu$. The conclusion was established under weaker assumptions than the prior literature and, moreover, several important problems (such as LASSO, SVM, and Total Variation) were shown to satisfy these weaker assumptions automatically (but not the previously used conditions from the literature). These results revealed that sub-gradient learning methods have more favorable behavior than originally thought when used to enable continuous adaptation and learning. The results of Part I were exclusive to single-agent adaptation. The purpose of the current Part II is to examine the implications of these discoveries when a collection of networked agents employs subgradient learning as their cooperative mechanism. The analysis will show that, despite the coupled dynamics that arises in a networked scenario, the agents are still able to attain linear convergence in the stochastic case; they are also able to reach agreement within $O(\mu)$ of the optimizer.
In large-scale natural disasters, humans are likely to fail when they attempt to reach high-risk sites or act in search and rescue operations. Robots, however, outdo their counterparts in surviving the hazards and handling the search and rescue missions due to their multiple and diverse sensing and actuation capabilities. The dynamic formation of optimal coalition of these heterogeneous robots for cost efficiency is very challenging and research in the area is gaining more and more attention. In this paper, we propose a novel heuristic. Since the population of robots in large-scale disaster settings is very large, we rely on Quantum Multi-Objective Particle Swarm Optimization (QMOPSO). The problem is modeled as a multi-objective optimization problem. Simulations with different test cases and metrics, and comparison with other algorithms such as NSGA-II and SPEA-II are carried out. The experimental results show that the proposed algorithm outperforms the existing algorithms not only in terms of convergence but also in terms of diversity and processing time.
In reinforcement learning, agents learn by performing actions and observing their outcomes. Sometimes, it is desirable for a human operator to \textitinterrupt an agent in order to prevent dangerous situations from happening. Yet, as part of their learning process, agents may link these interruptions, that impact their reward, to specific states and deliberately avoid them. The situation is particularly challenging in a multi-agent context because agents might not only learn from their own past interruptions, but also from those of other agents. Orseau and Armstrong defined \emphsafe interruptibility for one learner, but their work does not naturally extend to multi-agent systems. This paper introduces \textitdynamic safe interruptibility, an alternative definition more suited to decentralized learning problems, and studies this notion in two learning frameworks: \textitjoint action learners and \textitindependent learners. We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners. We show however that if agents can detect interruptions, it is possible to prune the observations to ensure dynamic safe interruptibility even for independent learners.
We study the problem of cooperative inference where a group of agents interact over a network and seek to estimate a joint parameter that best explains a set of observations. Agents do not know the network topology or the observations of other agents. We explore a variational interpretation of the Bayesian posterior density, and its relation to the stochastic mirror descent algorithm, to propose a new distributed learning algorithm. We show that, under appropriate assumptions, the beliefs generated by the proposed algorithm concentrate around the true parameter exponentially fast. We provide explicit non-asymptotic bounds for the convergence rate. Moreover, we develop explicit and computationally efficient algorithms for observation models belonging to exponential families.
In this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents.
The usual epistemic S5 model for multi-agent systems is a Kripke graph, whose edges are labeled with the agents that do not distinguish between two states. We propose to uncover the higher dimensional information implicit in the Kripke graph, by using as a model its dual, a chromatic simplicial complex. For each state of the Kripke model there is a facet in the complex, with one vertex per agent. If an edge (u,v) is labeled with a set of agents S, the facets corresponding to u and v intersect in a simplex consisting of one vertex for each agent of S. Then we use dynamic epistemic logic to study how the simplicial complex epistemic model changes after the agents communicate with each other. We show that there are topological invariants preserved from the initial epistemic complex to the epistemic complex after an action model is applied, that depend on how reliable the communication is. In turn these topological properties determine the knowledge that the agents may gain after the communication happens.
We present AutonoVi:, a novel algorithm for autonomous vehicle navigation that supports dynamic maneuvers and satisfies traffic constraints and norms. Our approach is based on optimization-based maneuver planning that supports dynamic lane-changes, swerving, and braking in all traffic scenarios and guides the vehicle to its goal position. We take into account various traffic constraints, including collision avoidance with other vehicles, pedestrians, and cyclists using control velocity obstacles. We use a data-driven approach to model the vehicle dynamics for control and collision avoidance. Furthermore, our trajectory computation algorithm takes into account traffic rules and behaviors, such as stopping at intersections and stoplights, based on an arc-spline representation. We have evaluated our algorithm in a simulated environment and tested its interactive performance in urban and highway driving scenarios with tens of vehicles, pedestrians, and cyclists. These scenarios include jaywalking pedestrians, sudden stops from high speeds, safely passing cyclists, a vehicle suddenly swerving into the roadway, and high-density traffic where the vehicle must change lanes to progress more effectively.
Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerometer data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.
Voting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Specifically, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisfied by the voting rule.
Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RL-based approaches fail to generalize since: (a) the gap between simulation and real world is so large that policy-learning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from H-infinity control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced -- that is, it learns an optimal destabilization policy. We formulate the policy learning as a zero-sum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A key stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep RL relies. This paper proposes two methods that address this problem: 1) conditioning each agent's value function on a footprint that disambiguates the age of the data sampled from the replay memory and 2) using a multi-agent variant of importance sampling to naturally decay obsolete data. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and keeping the rest at optimum capacity. We analyse how abstract grouping can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.
Successful analysis of player skills in video games has important impacts on the process of enhancing player experience without undermining their continuous skill development. Moreover, player skill analysis becomes more intriguing in team-based video games because such form of study can help discover useful factors in effective team formation. In this paper, we consider the problem of skill decomposition in MOBA (MultiPlayer Online Battle Arena) games, with the goal to understand what player skill factors are essential for the outcome of a game match. To understand the construct of MOBA player skills, we utilize various skill-based predictive models to decompose player skills into interpretative parts, the impact of which are assessed in statistical terms. We apply this analysis approach on two widely known MOBAs, namely League of Legends (LoL) and Defense of the Ancients 2 (DOTA2). The finding is that base skills of in-game avatars, base skills of players, and players' champion-specific skills are three prominent skill components influencing LoL's match outcomes, while those of DOTA2 are mainly impacted by in-game avatars' base skills but not much by the other two.
This paper addresses tracking of a moving target in a multi-agent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each time induces a global time-varying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the non-asymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multi-agent tracking with nonlinear observations.
In multi-robot systems where a central decision maker is specifying the movement of each individual robot, a communication failure can severely impair the performance of the system. This paper develops a motion strategy that allows robots to safely handle critical communication failures for such multi-robot architectures. For each robot, the proposed algorithm computes a time horizon over which collisions with other robots are guaranteed not to occur. These safe time horizons are included in the commands being transmitted to the individual robots. In the event of a communication failure, the robots execute the last received velocity commands for the corresponding safe time horizons leading to a provably safe open-loop motion strategy. The resulting algorithm is computationally effective and is agnostic to the task that the robots are performing. The efficacy of the strategy is verified in simulation as well as on a team of differential-drive mobile robots.
We consider a swarm of $n$ autonomous mobile robots, distributed on a 2-dimensional grid. A basic task for such a swarm is the gathering process: all robots have to gather at one (not predefined) place. The work in this paper is motivated by the following insight: On one side, for swarms of robots distributed in the 2-dimensional Euclidean space, several gathering algorithms are known for extremely simple robots that are oblivious, have bounded viewing radius, no compass, and no "flags" to communicate a status to others. On the other side, in case of the 2-dimensional grid, the only known gathering algorithms for robots with bounded viewing radius without compass, need to memorize a constant number of rounds and need flags. In this paper we contribute the, to the best of our knowledge, first gathering algorithm on the grid, which works for anonymous, oblivious robots with bounded viewing range, without any further means of communication and without any memory. We prove its correctness and an $O(n^2)$ time bound. This time bound matches those of the best known algorithms for the Euclidean plane mentioned above.
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
Interactions between vehicles and pedestrians have always been a major problem in traffic safety. Experienced human drivers are able to analyze the environment and choose driving strategies that will help them avoid crashes. What is not yet clear, however, is how automated vehicles will interact with pedestrians. This paper proposes a new method for evaluating the safety and feasibility of the driving strategy of automated vehicles when encountering unsignalized crossings. MobilEye sensors installed on buses in Ann Arbor, Michigan, collected data on 2,973 valid crossing events. A stochastic interaction model was then created using a multivariate Gaussian mixture model. This model allowed us to simulate the movements of pedestrians reacting to an oncoming vehicle when approaching unsignalized crossings, and to evaluate the passing strategies of automated vehicles. A simulation was then conducted to demonstrate the evaluation procedure.
Social conventions govern countless behaviors all of us engage in every day, from how we greet each other to the languages we speak. But how can shared conventions emerge spontaneously in the absence of a central coordinating authority? The Naming Game model shows that networks of locally interacting individuals can spontaneously self-organize to produce global coordination. Here, we provide a gentle introduction to the main features of the model, from the dynamics observed in homogeneously mixing populations to the role played by more complex social networks, and to how slight modifications of the basic interaction rules give origin to a richer phenomenology in which more conventions can co-exist indefinitely.
Opponent modeling consists in modeling the strategy or preferences of an agent thanks to the data it provides. In the context of automated negotiation and with machine learning, it can result in an advantage so overwhelming that it may restrain some casual agents to be part of the bargaining process. We qualify as "curious" an agent driven by the desire of negotiating in order to collect information and improve its opponent model. However, neither curiosity-based rational-ity nor curiosity-robust protocol have been studied in automatic negotiation. In this paper, we rely on mechanism design to propose three extensions of the standard bargaining protocol that limit information leak. Those extensions are supported by an enhanced rationality model, that considers the exchanged information. Also, they are theoretically analyzed and experimentally evaluated.
Algorithms for equilibrium computation generally make no attempt to ensure that the computed strategies are understandable by humans. For instance the strategies for the strongest poker agents are represented as massive binary files. In many situations, we would like to compute strategies that can actually be implemented by humans, who may have computational limitations and may only be able to remember a small number of features or components of the strategies that have been computed. We study poker games where private information distributions can be arbitrary. We create a large training set of game instances and solutions, by randomly selecting the information probabilities, and present algorithms that learn from the training instances in order to perform well in games with unseen information distributions. We are able to conclude several new fundamental rules about poker strategy that can be easily implemented by humans.
The paper studies the problem of achieving consensus in multi-agent systems in the case where the dependency digraph $\Gamma$ has no spanning in-tree. We consider the regularization protocol that amounts to the addition of a dummy agent (hub) uniformly connected to the agents. The presence of such a hub guarantees the achievement of an asymptotic consensus. For the "evaporation" of the dummy agent, the strength of its influences on the other agents vanishes, which leads to the concept of latent consensus. We obtain a closed-form expression for the consensus when the connections of the hub are symmetric, in this case, the impact of the hub upon the consensus remains fixed. On the other hand, if the hub is essentially influenced by the agents, whereas its influence on them tends to zero, then the consensus is expressed by the scalar product of the vector of column means of the Laplacian eigenprojection of $\Gamma$ and the initial state vector of the system. Another protocol, which assumes the presence of vanishingly weak uniform background links between the agents, leads to the same latent consensus.
We present a distributed (non-Bayesian) learning algorithm for the problem of parameter estimation with Gaussian noise. The algorithm is expressed as explicit updates on the parameters of the Gaussian beliefs (i.e. means and precision). We show a convergence rate of $O(1/k)$ with the constant term depending on the number of agents and the topology of the network. Moreover, we show almost sure convergence to the optimal solution of the estimation problem for the general case of time-varying directed graphs.
In this paper we extend the principle of proportional representation to rankings. We consider the setting where alternatives need to be ranked based on approval preferences. In this setting, proportional representation requires that cohesive groups of voters are represented proportionally in each initial segment of the ranking. Proportional rankings are desirable in situations where initial segments of different lengths may be relevant, e.g., hiring decisions (if it is unclear how many positions are to be filled), the presentation of competing proposals on a liquid democracy platform (if it is unclear how many proposals participants are taking into consideration), or recommender systems (if a ranking has to accommodate different user types). We study the proportional representation provided by several ranking methods and prove theoretical guarantees. Furthermore, we experimentally evaluate these methods and present preliminary evidence as to which methods are most suitable for producing proportional rankings.
In this paper, we discuss how to design the graph topology to reduce the communication complexity of certain algorithms for decentralized optimization. Our goal is to minimize the total communication needed to achieve a prescribed accuracy. We discover that the so-called expander graphs are near-optimal choices. We propose three approaches to construct expander graphs for different numbers of nodes and node degrees. Our numerical results show that the performance of decentralized optimization is significantly better on expander graphs than other regular graphs.
The computational study of elections generally assumes that the preferences of the electorate come in as a list of votes. Depending on the context, it may be much more natural to represent the list succinctly, as the distinct votes of the electorate and their counts. We consider how this succinct representation of the voters affects the computational complexity of election problems. Though the succinct representation may be exponentially smaller than the nonsuccinct representation, we find only one natural case where the complexity increases, namely the complexity of winner determination for Kemeny elections. This is in sharp contrast to the case where each voter has a weight, where the complexity usually increases.
Nov 24 2016 cs.MA
Published during a severe economic crisis, this study presents the first spatial microsimulation model for the analysis of income inequalities and poverty in Greece. First, we present a brief overview of the method and discuss its potential for the analysis of multidimensional poverty and income inequality in Greece. We then present the SimAthens model, based on a combination of small-area demographic and socioeconomic information available from the Greek census of population with data from the European Union Statistics on Income and Living Conditions (EU-SILC). The model is based on an iterative proportional fitting (IPF) algorithm, and is used to reweigh EU-SILC records to fit in small-area descriptions for Athens based on 2001 and 2011 censuses. This is achieved by using demographic and socioeconomic characteristics as constraint variables. Finally, synthesis of the labor market and occupations are chosen as the main variables for externally validating our results, in order to verify the integrity of the model. Results of this external validation process are found to be extremely satisfactory, indicating a high goodness of fit between simulated and real values. Finally, the study presents a number of model outputs, illustrating changes in social and economic geography, during a severe economic crisis, offering a great opportunity for discussing further potential of this model in policy analysis.
Logic-based representations of multi-agent systems have been extensively studied. In this work, we focus on the action language BC to formalize global views of MAS domains. Methodologically, we start representing the behaviour of each agent by an action description from a single agent perspective. Then, it goes through two stages that guide the modeler in composing the global view by first designating multi-agent aspects of the domain via potential conflicts and later resolving these conflicts according to the expected behaviour of the overall system. Considering that representing single agent descriptions is relatively simpler than representing multi-agent description directly, the formalization developed here is valuable from a knowledge representation perspective.
Acquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step.
Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game. In this paper, we propose the \emphGenerative Multi-Adversarial Network (GMAN), a framework that extends GANs to multiple discriminators. In previous work, the successful training of GANs requires modifying the minimax objective to accelerate training early on. In contrast, GMAN can be reliably trained with the original, untampered objective. We explore a number of design perspectives with the discriminator role ranging from formidable adversary to forgiving teacher. Image generation tasks comparing the proposed framework to standard GANs demonstrate GMAN produces higher quality samples in a fraction of the iterations when measured by a pairwise GAM-type metric.
We consider election scenarios with incomplete information, a situation that arises often in practice. There are several models of incomplete information and accordingly, different notions of outcomes of such elections. In one well-studied model of incompleteness, the votes are given by partial orders over the candidates. In this context we can frame the problem of finding a possible winner, which involves determining whether a given candidate wins in at least one completion of a given set of partial votes for a specific voting rule. The possible winner problem is well-known to be NP-complete in general, and it is in fact known to be NP-complete for several voting rules where the number of undetermined pairs in every vote is bounded only by some constant. In this paper, we address the question of determining precisely the smallest number of undetermined pairs for which the possible winner problem remains NP-complete. In particular, we find the exact values of $t$ for which the possible winner problem transitions to being NP-complete from being in P, where $t$ is the maximum number of undetermined pairs in every vote. We demonstrate tight results for a broad subclass of scoring rules which includes all the commonly used scoring rules (such as plurality, veto, Borda, $k$-approval, and so on), Copeland$^\alpha$ for every $\alpha\in[0,1]$, maximin, and Bucklin voting rules. A somewhat surprising aspect of our results is that for many of these rules, the possible winner problem turns out to be hard even if every vote has at most one undetermined pair of candidates.
In participatory budgeting, communities collectively decide on the allocation of public tax dollars for local public projects. In this work, we consider the question of fairly aggregating the preferences of community members to determine an allocation of funds to projects. This problem is different from standard fair resource allocation because of public goods: The allocated goods benefit all users simultaneously. Fairness is crucial in participatory decision making, since generating equitable outcomes is an important goal of these processes. We argue that the classic game theoretic notion of core captures fairness in the setting. To compute the core, we first develop a novel characterization of a public goods market equilibrium called the Lindahl equilibrium, which is always a core solution. We then provide the first (to our knowledge) polynomial time algorithm for computing such an equilibrium for a broad set of utility functions; our algorithm also generalizes (in a non-trivial way) the well-known concept of proportional fairness. We use our theoretical insights to perform experiments on real participatory budgeting voting data. We empirically show that the core can be efficiently computed for utility functions that naturally model our practical setting, and examine the relation of the core with the familiar welfare objective. Finally, we address concerns of incentives and mechanism design by developing a randomized approximately dominant-strategy truthful mechanism building on the exponential mechanism from differential privacy.
This paper proposes models of learning process in teams of individuals who collectively execute a sequence of tasks and whose actions are determined by individual skill levels and networks of interpersonal appraisals and influence. The closely-related proposed models have increasing complexity, starting with a centralized manager-based assignment and learning model, and finishing with a social model of interpersonal appraisal, assignments, learning, and influences. We show how rational optimal behavior arises along the task sequence for each model, and discuss conditions of suboptimality. Our models are grounded in replicator dynamics from evolutionary games, influence networks from mathematical sociology, and transactive memory systems from organization science.
Sep 27 2016 cs.MA
Finding feasible, collision-free paths for multiagent systems can be challenging, particularly in non-communicating scenarios where each agent's intent (e.g. goal) is unobservable to the others. In particular, finding time efficient paths often requires anticipating interaction with neighboring agents, the process of which can be computationally prohibitive. This work presents a decentralized multiagent collision avoidance algorithm based on a novel application of deep reinforcement learning, which effectively offloads the online computation (for predicting interaction patterns) to an offline learning procedure. Specifically, the proposed approach develops a value network that encodes the estimated time to the goal given an agent's joint configuration (positions and velocities) with its neighbors. Use of the value network not only admits efficient (i.e., real-time implementable) queries for finding a collision-free velocity vector, but also considers the uncertainty in the other agents' motion. Simulation results show more than 26 percent improvement in paths quality (i.e., time to reach the goal) when compared with optimal reciprocal collision avoidance (ORCA), a state-of-the-art collision avoidance strategy.
We overview some results on distributed learning with focus on a family of recently proposed algorithms known as non-Bayesian social learning. We consider different approaches to the distributed learning problem and its algorithmic solutions for the case of finitely many hypotheses. The original centralized problem is discussed at first, and then followed by a generalization to the distributed setting. The results on convergence and convergence rate are presented for both asymptotic and finite time regimes. Various extensions are discussed such as those dealing with directed time-varying networks, Nesterov's acceleration technique and a continuum sets of hypothesis.
Studies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-angled view scene. This paper deals with tracking pedestrian crowd under heavy occlusions from an angular scene. Our automated tracking system consists of two modules that perform sequentially. The first module detects moving objects as blobs. The second module is a tracking system. We employ probability distribution from the detection of each pedestrian and use Bayesian update to track the next position. The result of such tracking is a database of pedestrian trajectories over time and space. With certain prior information, we showed that the system can track a large number of people under occlusion and clutter scene.
We study the computational complexity of several scenarios of strategic behavior for the Kemeny procedure in the setting of judgment aggregation. In particular, we investigate (1) manipulation, where an individual aims to achieve a better group outcome by reporting an insincere individual opinion, (2) bribery, where an external agent aims to achieve an outcome with certain properties by bribing a number of individuals, and (3) control (by adding or deleting issues), where an external agent aims to achieve an outcome with certain properties by influencing the set of issues in the judgment aggregation situation. We show that determining whether these types of strategic behavior are possible (and if so, computing a policy for successful strategic behavior) is complete for the second level of the Polynomial Hierarchy. That is, we show that these problems are $\Sigma^p_2$-complete.
In this paper, we develop an agent-based model which integrates four important elements, i.e. organisational energy management policies/regulations, energy management technologies, electric appliances and equipment, and human behaviour, based on a case study, to simulate the energy consumption in office buildings. With the model, we test the effectiveness of different energy management strategies, and solve practical office energy consumption problems. This paper theoretically contributes to an integration of four elements involved in the complex organisational issue of office energy consumption, and practically contributes to an application of agent-based approach for office building energy consumption study.
A framework for consensus modelling is introduced using Kleene's three valued logic as a means to express vagueness in agents' beliefs. Explicitly borderline cases are inherent to propositions involving vague concepts where sentences of a propositional language may be absolutely true, absolutely false or borderline. By exploiting these intermediate truth values, we can allow agents to adopt a more vague interpretation of underlying concepts in order to weaken their beliefs and reduce the levels of inconsistency, so as to achieve consensus. We consider a consensus combination operation which results in agents adopting the borderline truth value as a shared viewpoint if they are in direct conflict. Simulation experiments are presented which show that applying this operator to agents chosen at random (subject to a consistency threshold) from a population, with initially diverse opinions, results in convergence to a smaller set of more precise shared beliefs. Furthermore, if the choice of agents for combination is dependent on the payoff of their beliefs, this acting as a proxy for performance or usefulness, then the system converges to beliefs which, on average, have higher payoff.
We consider the positioning problem of aerial drone systems for efficient three-dimensional (3-D) coverage. Our solution draws from molecular geometry, where forces among electron pairs surrounding a central atom arrange their positions. In this paper, we propose a 3-D clustering algorithm for autonomous positioning (VBCA) of aerial drone networks based on virtual forces. These virtual forces induce interactions among drones and structure the system topology. The advantages of our approach are that (1) virtual forces enable drones to self-organize the positioning process and (2) VBCA can be implemented entirely localized. Extensive simulations show that our virtual forces clustering approach produces scalable 3-D topologies exhibiting near-optimal volume coverage. VBCA triggers efficient topology rearrangement for an altering number of nodes, while providing network connectivity to the central drone. We also draw a comparison of volume coverage achieved by VBCA against existing approaches and find VBCA up to 40\% more efficient.
A true lie is a lie that becomes true when announced. In a logic of announcements, where the announcing agent is not modelled, a true lie is a formula (that is false and) that becomes true when announced. We investigate true lies and other types of interaction between announced formulas, their preconditions and their postconditions, in the setting of Gerbrandy's logic of believed announcements, wherein agents may have or obtain incorrect beliefs. Our results are on the satisfiability and validity of instantiations of these semantically defined categories, on iterated announcements, including arbitrarily often iterated announcements, and on syntactic characterization. We close with results for iterated announcements in the logic of knowledge (instead of belief), and for lying as private announcements (instead of public announcements) to different agents. Detailed examples illustrate our lying concepts.
Extensive work has been conducted both in game theory and logic to model strategic interaction. An important question is whether we can use these theories to design agents for interacting with people? On the one hand, they provide a formal design specification for agent strategies. On the other hand, people do not necessarily adhere to playing in accordance with these strategies, and their behavior is affected by a multitude of social and psychological factors. In this paper we will consider the question of whether strategies implied by theories of strategic behavior can be used by automated agents that interact proficiently with people. We will focus on automated agents that we built that need to interact with people in two negotiation settings: bargaining and deliberation. For bargaining we will study game-theory based equilibrium agents and for argumentation we will discuss logic-based argumentation theory. We will also consider security games and persuasion games and will discuss the benefits of using equilibrium based agents.
We consider goods that can be shared with k-hop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.
This paper proposes three different distributed event-triggered control algorithms to achieve leader-follower consensus for a network of Euler-Lagrange agents. We firstly propose two model-independent algorithms for a subclass of Euler-Lagrange agents without the vector of gravitational potential forces. By model-independent, we mean that each agent can execute its algorithm with no knowledge of the agent self-dynamics. A variable-gain algorithm is employed when the sensing graph is undirected; algorithm parameters are selected in a fully distributed manner with much greater flexibility compared to all previous work concerning event-triggered consensus problems. When the sensing graph is directed, a constant-gain algorithm is employed. The control gains must be centrally designed to exceed several lower bounding inequalities which require limited knowledge of bounds on the matrices describing the agent dynamics, bounds on network topology information and bounds on the initial conditions. When the Euler-Lagrange agents have dynamics which include the vector of gravitational potential forces, an adaptive algorithm is proposed which requires more information about the agent dynamics but can estimate uncertain agent parameters. For each algorithm, a trigger function is proposed to govern the event update times. At each event, the controller is updated, which ensures that the control input is piecewise constant and saves energy resources. We analyse each controllers and trigger function and exclude Zeno behaviour. Extensive simulations show 1) the advantages of our proposed trigger function as compared to those in existing literature, and 2) the effectiveness of our proposed controllers.