We present AutonoVi:, a novel algorithm for autonomous vehicle navigation that supports dynamic maneuvers and satisfies traffic constraints and norms. Our approach is based on optimization-based maneuver planning that supports dynamic lane-changes, swerving, and braking in all traffic scenarios and guides the vehicle to its goal position. We take into account various traffic constraints, including collision avoidance with other vehicles, pedestrians, and cyclists using control velocity obstacles. We use a data-driven approach to model the vehicle dynamics for control and collision avoidance. Furthermore, our trajectory computation algorithm takes into account traffic rules and behaviors, such as stopping at intersections and stoplights, based on an arc-spline representation. We have evaluated our algorithm in a simulated environment and tested its interactive performance in urban and highway driving scenarios with tens of vehicles, pedestrians, and cyclists. These scenarios include jaywalking pedestrians, sudden stops from high speeds, safely passing cyclists, a vehicle suddenly swerving into the roadway, and high-density traffic where the vehicle must change lanes to progress more effectively.
Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerometer data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.
Voting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Specifically, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisfied by the voting rule.
Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RL-based approaches fail to generalize since: (a) the gap between simulation and real world is so large that policy-learning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from H-infinity control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced -- that is, it learns an optimal destabilization policy. We formulate the policy learning as a zero-sum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A key stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep RL relies. This paper proposes two methods that address this problem: 1) conditioning each agent's value function on a footprint that disambiguates the age of the data sampled from the replay memory and 2) using a multi-agent variant of importance sampling to naturally decay obsolete data. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
Mar 01 2017 cs.MA
Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and keeping the rest at optimum capacity. We analyse how abstract grouping can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.
Successful analysis of player skills in video games has important impacts on the process of enhancing player experience without undermining their continuous skill development. Moreover, player skill analysis becomes more intriguing in team-based video games because such form of study can help discover useful factors in effective team formation. In this paper, we consider the problem of skill decomposition in MOBA (MultiPlayer Online Battle Arena) games, with the goal to understand what player skill factors are essential for the outcome of a game match. To understand the construct of MOBA player skills, we utilize various skill-based predictive models to decompose player skills into interpretative parts, the impact of which are assessed in statistical terms. We apply this analysis approach on two widely known MOBAs, namely League of Legends (LoL) and Defense of the Ancients 2 (DOTA2). The finding is that base skills of in-game avatars, base skills of players, and players' champion-specific skills are three prominent skill components influencing LoL's match outcomes, while those of DOTA2 are mainly impacted by in-game avatars' base skills but not much by the other two.
This paper addresses tracking of a moving target in a multi-agent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each time induces a global time-varying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the non-asymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multi-agent tracking with nonlinear observations.
In multi-robot systems where a central decision maker is specifying the movement of each individual robot, a communication failure can severely impair the performance of the system. This paper develops a motion strategy that allows robots to safely handle critical communication failures for such multi-robot architectures. For each robot, the proposed algorithm computes a time horizon over which collisions with other robots are guaranteed not to occur. These safe time horizons are included in the commands being transmitted to the individual robots. In the event of a communication failure, the robots execute the last received velocity commands for the corresponding safe time horizons leading to a provably safe open-loop motion strategy. The resulting algorithm is computationally effective and is agnostic to the task that the robots are performing. The efficacy of the strategy is verified in simulation as well as on a team of differential-drive mobile robots.
We consider a swarm of $n$ autonomous mobile robots, distributed on a 2-dimensional grid. A basic task for such a swarm is the gathering process: all robots have to gather at one (not predefined) place. The work in this paper is motivated by the following insight: On one side, for swarms of robots distributed in the 2-dimensional Euclidean space, several gathering algorithms are known for extremely simple robots that are oblivious, have bounded viewing radius, no compass, and no "flags" to communicate a status to others. On the other side, in case of the 2-dimensional grid, the only known gathering algorithms for robots with bounded viewing radius without compass, need to memorize a constant number of rounds and need flags. In this paper we contribute the, to the best of our knowledge, first gathering algorithm on the grid, which works for anonymous, oblivious robots with bounded viewing range, without any further means of communication and without any memory. We prove its correctness and an $O(n^2)$ time bound. This time bound matches those of the best known algorithms for the Euclidean plane mentioned above.
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
Interactions between vehicles and pedestrians have always been a major problem in traffic safety. Experienced human drivers are able to analyze the environment and choose driving strategies that will help them avoid crashes. What is not yet clear, however, is how automated vehicles will interact with pedestrians. This paper proposes a new method for evaluating the safety and feasibility of the driving strategy of automated vehicles when encountering unsignalized crossings. MobilEye sensors installed on buses in Ann Arbor, Michigan, collected data on 2,973 valid crossing events. A stochastic interaction model was then created using a multivariate Gaussian mixture model. This model allowed us to simulate the movements of pedestrians reacting to an oncoming vehicle when approaching unsignalized crossings, and to evaluate the passing strategies of automated vehicles. A simulation was then conducted to demonstrate the evaluation procedure.
Social conventions govern countless behaviors all of us engage in every day, from how we greet each other to the languages we speak. But how can shared conventions emerge spontaneously in the absence of a central coordinating authority? The Naming Game model shows that networks of locally interacting individuals can spontaneously self-organize to produce global coordination. Here, we provide a gentle introduction to the main features of the model, from the dynamics observed in homogeneously mixing populations to the role played by more complex social networks, and to how slight modifications of the basic interaction rules give origin to a richer phenomenology in which more conventions can co-exist indefinitely.
Opponent modeling consists in modeling the strategy or preferences of an agent thanks to the data it provides. In the context of automated negotiation and with machine learning, it can result in an advantage so overwhelming that it may restrain some casual agents to be part of the bargaining process. We qualify as "curious" an agent driven by the desire of negotiating in order to collect information and improve its opponent model. However, neither curiosity-based rational-ity nor curiosity-robust protocol have been studied in automatic negotiation. In this paper, we rely on mechanism design to propose three extensions of the standard bargaining protocol that limit information leak. Those extensions are supported by an enhanced rationality model, that considers the exchanged information. Also, they are theoretically analyzed and experimentally evaluated.
Algorithms for equilibrium computation generally make no attempt to ensure that the computed strategies are understandable by humans. For instance the strategies for the strongest poker agents are represented as massive binary files. In many situations, we would like to compute strategies that can actually be implemented by humans, who may have computational limitations and may only be able to remember a small number of features or components of the strategies that have been computed. We study poker games where private information distributions can be arbitrary. We create a large training set of game instances and solutions, by randomly selecting the information probabilities, and present algorithms that learn from the training instances in order to perform well in games with unseen information distributions. We are able to conclude several new fundamental rules about poker strategy that can be easily implemented by humans.
The paper studies the problem of achieving consensus in multi-agent systems in the case where the dependency digraph $\Gamma$ has no spanning in-tree. We consider the regularization protocol that amounts to the addition of a dummy agent (hub) uniformly connected to the agents. The presence of such a hub guarantees the achievement of an asymptotic consensus. For the "evaporation" of the dummy agent, the strength of its influences on the other agents vanishes, which leads to the concept of latent consensus. We obtain a closed-form expression for the consensus when the connections of the hub are symmetric, in this case, the impact of the hub upon the consensus remains fixed. On the other hand, if the hub is essentially influenced by the agents, whereas its influence on them tends to zero, then the consensus is expressed by the scalar product of the vector of column means of the Laplacian eigenprojection of $\Gamma$ and the initial state vector of the system. Another protocol, which assumes the presence of vanishingly weak uniform background links between the agents, leads to the same latent consensus.
We present a distributed (non-Bayesian) learning algorithm for the problem of parameter estimation with Gaussian noise. The algorithm is expressed as explicit updates on the parameters of the Gaussian beliefs (i.e. means and precision). We show a convergence rate of $O(1/k)$ with the constant term depending on the number of agents and the topology of the network. Moreover, we show almost sure convergence to the optimal solution of the estimation problem for the general case of time-varying directed graphs.
In this paper we extend the principle of proportional representation to rankings. We consider the setting where alternatives need to be ranked based on approval preferences. In this setting, proportional representation requires that cohesive groups of voters are represented proportionally in each initial segment of the ranking. Proportional rankings are desirable in situations where initial segments of different lengths may be relevant, e.g., hiring decisions (if it is unclear how many positions are to be filled), the presentation of competing proposals on a liquid democracy platform (if it is unclear how many proposals participants are taking into consideration), or recommender systems (if a ranking has to accommodate different user types). We study the proportional representation provided by several ranking methods and prove theoretical guarantees. Furthermore, we experimentally evaluate these methods and present preliminary evidence as to which methods are most suitable for producing proportional rankings.
In this paper, we discuss how to design the graph topology to reduce the communication complexity of certain algorithms for decentralized optimization. Our goal is to minimize the total communication needed to achieve a prescribed accuracy. We discover that the so-called expander graphs are near-optimal choices. We propose three approaches to construct expander graphs for different numbers of nodes and node degrees. Our numerical results show that the performance of decentralized optimization is significantly better on expander graphs than other regular graphs.
The computational study of elections generally assumes that the preferences of the electorate come in as a list of votes. Depending on the context, it may be much more natural to represent the list succinctly, as the distinct votes of the electorate and their counts. We consider how this succinct representation of the voters affects the computational complexity of election problems. Though the succinct representation may be exponentially smaller than the nonsuccinct representation, we find only one natural case where the complexity increases, namely the complexity of winner determination for Kemeny elections. This is in sharp contrast to the case where each voter has a weight, where the complexity usually increases.
Nov 24 2016 cs.MA
Published during a severe economic crisis, this study presents the first spatial microsimulation model for the analysis of income inequalities and poverty in Greece. First, we present a brief overview of the method and discuss its potential for the analysis of multidimensional poverty and income inequality in Greece. We then present the SimAthens model, based on a combination of small-area demographic and socioeconomic information available from the Greek census of population with data from the European Union Statistics on Income and Living Conditions (EU-SILC). The model is based on an iterative proportional fitting (IPF) algorithm, and is used to reweigh EU-SILC records to fit in small-area descriptions for Athens based on 2001 and 2011 censuses. This is achieved by using demographic and socioeconomic characteristics as constraint variables. Finally, synthesis of the labor market and occupations are chosen as the main variables for externally validating our results, in order to verify the integrity of the model. Results of this external validation process are found to be extremely satisfactory, indicating a high goodness of fit between simulated and real values. Finally, the study presents a number of model outputs, illustrating changes in social and economic geography, during a severe economic crisis, offering a great opportunity for discussing further potential of this model in policy analysis.
Logic-based representations of multi-agent systems have been extensively studied. In this work, we focus on the action language BC to formalize global views of MAS domains. Methodologically, we start representing the behaviour of each agent by an action description from a single agent perspective. Then, it goes through two stages that guide the modeler in composing the global view by first designating multi-agent aspects of the domain via potential conflicts and later resolving these conflicts according to the expected behaviour of the overall system. Considering that representing single agent descriptions is relatively simpler than representing multi-agent description directly, the formalization developed here is valuable from a knowledge representation perspective.
Acquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step.
Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game. In this paper, we propose the \emphGenerative Multi-Adversarial Network (GMAN), a framework that extends GANs to multiple discriminators. In previous work, the successful training of GANs requires modifying the minimax objective to accelerate training early on. In contrast, GMAN can be reliably trained with the original, untampered objective. We explore a number of design perspectives with the discriminator role ranging from formidable adversary to forgiving teacher. Image generation tasks comparing the proposed framework to standard GANs demonstrate GMAN produces higher quality samples in a fraction of the iterations when measured by a pairwise GAM-type metric.
We consider election scenarios with incomplete information, a situation that arises often in practice. There are several models of incomplete information and accordingly, different notions of outcomes of such elections. In one well-studied model of incompleteness, the votes are given by partial orders over the candidates. In this context we can frame the problem of finding a possible winner, which involves determining whether a given candidate wins in at least one completion of a given set of partial votes for a specific voting rule. The possible winner problem is well-known to be NP-complete in general, and it is in fact known to be NP-complete for several voting rules where the number of undetermined pairs in every vote is bounded only by some constant. In this paper, we address the question of determining precisely the smallest number of undetermined pairs for which the possible winner problem remains NP-complete. In particular, we find the exact values of $t$ for which the possible winner problem transitions to being NP-complete from being in P, where $t$ is the maximum number of undetermined pairs in every vote. We demonstrate tight results for a broad subclass of scoring rules which includes all the commonly used scoring rules (such as plurality, veto, Borda, $k$-approval, and so on), Copeland$^\alpha$ for every $\alpha\in[0,1]$, maximin, and Bucklin voting rules. A somewhat surprising aspect of our results is that for many of these rules, the possible winner problem turns out to be hard even if every vote has at most one undetermined pair of candidates.
In participatory budgeting, communities collectively decide on the allocation of public tax dollars for local public projects. In this work, we consider the question of fairly aggregating the preferences of community members to determine an allocation of funds to projects. This problem is different from standard fair resource allocation because of public goods: The allocated goods benefit all users simultaneously. Fairness is crucial in participatory decision making, since generating equitable outcomes is an important goal of these processes. We argue that the classic game theoretic notion of core captures fairness in the setting. To compute the core, we first develop a novel characterization of a public goods market equilibrium called the Lindahl equilibrium, which is always a core solution. We then provide the first (to our knowledge) polynomial time algorithm for computing such an equilibrium for a broad set of utility functions; our algorithm also generalizes (in a non-trivial way) the well-known concept of proportional fairness. We use our theoretical insights to perform experiments on real participatory budgeting voting data. We empirically show that the core can be efficiently computed for utility functions that naturally model our practical setting, and examine the relation of the core with the familiar welfare objective. Finally, we address concerns of incentives and mechanism design by developing a randomized approximately dominant-strategy truthful mechanism building on the exponential mechanism from differential privacy.
This paper proposes models of learning process in teams of individuals who collectively execute a sequence of tasks and whose actions are determined by individual skill levels and networks of interpersonal appraisals and influence. The closely-related proposed models have increasing complexity, starting with a centralized manager-based assignment and learning model, and finishing with a social model of interpersonal appraisal, assignments, learning, and influences. We show how rational optimal behavior arises along the task sequence for each model, and discuss conditions of suboptimality. Our models are grounded in replicator dynamics from evolutionary games, influence networks from mathematical sociology, and transactive memory systems from organization science.
Sep 27 2016 cs.MA
Finding feasible, collision-free paths for multiagent systems can be challenging, particularly in non-communicating scenarios where each agent's intent (e.g. goal) is unobservable to the others. In particular, finding time efficient paths often requires anticipating interaction with neighboring agents, the process of which can be computationally prohibitive. This work presents a decentralized multiagent collision avoidance algorithm based on a novel application of deep reinforcement learning, which effectively offloads the online computation (for predicting interaction patterns) to an offline learning procedure. Specifically, the proposed approach develops a value network that encodes the estimated time to the goal given an agent's joint configuration (positions and velocities) with its neighbors. Use of the value network not only admits efficient (i.e., real-time implementable) queries for finding a collision-free velocity vector, but also considers the uncertainty in the other agents' motion. Simulation results show more than 26 percent improvement in paths quality (i.e., time to reach the goal) when compared with optimal reciprocal collision avoidance (ORCA), a state-of-the-art collision avoidance strategy.
We overview some results on distributed learning with focus on a family of recently proposed algorithms known as non-Bayesian social learning. We consider different approaches to the distributed learning problem and its algorithmic solutions for the case of finitely many hypotheses. The original centralized problem is discussed at first, and then followed by a generalization to the distributed setting. The results on convergence and convergence rate are presented for both asymptotic and finite time regimes. Various extensions are discussed such as those dealing with directed time-varying networks, Nesterov's acceleration technique and a continuum sets of hypothesis.
Studies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-angled view scene. This paper deals with tracking pedestrian crowd under heavy occlusions from an angular scene. Our automated tracking system consists of two modules that perform sequentially. The first module detects moving objects as blobs. The second module is a tracking system. We employ probability distribution from the detection of each pedestrian and use Bayesian update to track the next position. The result of such tracking is a database of pedestrian trajectories over time and space. With certain prior information, we showed that the system can track a large number of people under occlusion and clutter scene.
We study the computational complexity of several scenarios of strategic behavior for the Kemeny procedure in the setting of judgment aggregation. In particular, we investigate (1) manipulation, where an individual aims to achieve a better group outcome by reporting an insincere individual opinion, (2) bribery, where an external agent aims to achieve an outcome with certain properties by bribing a number of individuals, and (3) control (by adding or deleting issues), where an external agent aims to achieve an outcome with certain properties by influencing the set of issues in the judgment aggregation situation. We show that determining whether these types of strategic behavior are possible (and if so, computing a policy for successful strategic behavior) is complete for the second level of the Polynomial Hierarchy. That is, we show that these problems are $\Sigma^p_2$-complete.
In this paper, we develop an agent-based model which integrates four important elements, i.e. organisational energy management policies/regulations, energy management technologies, electric appliances and equipment, and human behaviour, based on a case study, to simulate the energy consumption in office buildings. With the model, we test the effectiveness of different energy management strategies, and solve practical office energy consumption problems. This paper theoretically contributes to an integration of four elements involved in the complex organisational issue of office energy consumption, and practically contributes to an application of agent-based approach for office building energy consumption study.
A framework for consensus modelling is introduced using Kleene's three valued logic as a means to express vagueness in agents' beliefs. Explicitly borderline cases are inherent to propositions involving vague concepts where sentences of a propositional language may be absolutely true, absolutely false or borderline. By exploiting these intermediate truth values, we can allow agents to adopt a more vague interpretation of underlying concepts in order to weaken their beliefs and reduce the levels of inconsistency, so as to achieve consensus. We consider a consensus combination operation which results in agents adopting the borderline truth value as a shared viewpoint if they are in direct conflict. Simulation experiments are presented which show that applying this operator to agents chosen at random (subject to a consistency threshold) from a population, with initially diverse opinions, results in convergence to a smaller set of more precise shared beliefs. Furthermore, if the choice of agents for combination is dependent on the payoff of their beliefs, this acting as a proxy for performance or usefulness, then the system converges to beliefs which, on average, have higher payoff.
We consider the positioning problem of aerial drone systems for efficient three-dimensional (3-D) coverage. Our solution draws from molecular geometry, where forces among electron pairs surrounding a central atom arrange their positions. In this paper, we propose a 3-D clustering algorithm for autonomous positioning (VBCA) of aerial drone networks based on virtual forces. These virtual forces induce interactions among drones and structure the system topology. The advantages of our approach are that (1) virtual forces enable drones to self-organize the positioning process and (2) VBCA can be implemented entirely localized. Extensive simulations show that our virtual forces clustering approach produces scalable 3-D topologies exhibiting near-optimal volume coverage. VBCA triggers efficient topology rearrangement for an altering number of nodes, while providing network connectivity to the central drone. We also draw a comparison of volume coverage achieved by VBCA against existing approaches and find VBCA up to 40\% more efficient.
A true lie is a lie that becomes true when announced. In a logic of announcements, where the announcing agent is not modelled, a true lie is a formula (that is false and) that becomes true when announced. We investigate true lies and other types of interaction between announced formulas, their preconditions and their postconditions, in the setting Gerbrandy's logic of believed announcements, wherein agents may have or obtain incorrect beliefs. Our results are on the satisfiability and validity of instantiations of these semantically defined categories, on iterated announcements, including arbitrarily often iterated announcements, and on syntactic characterization. We close with results for iterated announcements in the logic of knowledge (instead of belief), and for lying as private announcements (instead of public announcements) to different agents. Detailed examples illustrate our lying concepts.
Extensive work has been conducted both in game theory and logic to model strategic interaction. An important question is whether we can use these theories to design agents for interacting with people? On the one hand, they provide a formal design specification for agent strategies. On the other hand, people do not necessarily adhere to playing in accordance with these strategies, and their behavior is affected by a multitude of social and psychological factors. In this paper we will consider the question of whether strategies implied by theories of strategic behavior can be used by automated agents that interact proficiently with people. We will focus on automated agents that we built that need to interact with people in two negotiation settings: bargaining and deliberation. For bargaining we will study game-theory based equilibrium agents and for argumentation we will discuss logic-based argumentation theory. We will also consider security games and persuasion games and will discuss the benefits of using equilibrium based agents.
We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.
May 09 2016 cs.MA
This paper presents a computational approach to modelling group creativity. It presents an analysis of two studies of group creativity selected from different research cultures and identifies a common theme ("idea build-up") that is then used in the formalisation of an agent-based model used to support reasoning about the complex dynamics of building on the ideas of others.
Peer review, evaluation, and selection is the foundation on which modern science is built. Funding bodies the world over employ experts to study and select the best proposals of those submitted for funding. The problem of peer selection, however, is much more universal: a professional society may want give a subset of its members awards based on the opinions of all the members; an instructor for a MOOC or online course may want to crowdsource grading; or a marketing company may select ideas from group brainstorming sessions based on peer evaluation. We make three fundamental contributions to the study of procedures or mechanisms for peer selection, a specific type of group decision making problem studied in computer science, economics, political science, and beyond. First, we detail a novel mechanism that is strategyproof, i.e., agents cannot benefit themselves by reporting insincere valuations, in addition to other desirable normative properties. Second, we demonstrate the effectiveness of our mechanism through a comprehensive simulation based comparison of our mechanism with a suite of mechanisms found in the computer science and economics literature. Finally, our mechanism employs a randomized rounding technique that is of independent interest, as it can be used as a randomized method to addresses the ubiquitous apportionment problem that arises in various settings where discrete resources such as parliamentary representation slots need to be divided fairly.
Despite much scientific evidence, a large fraction of the American public doubts that greenhouse gases are causing global warming. We present a simulation model as a computational test-bed for climate prediction markets. Traders adapt their beliefs about future temperatures based on the profits of other traders in their social network. We simulate two alternative climate futures, in which global temperatures are primarily driven either by carbon dioxide or by solar irradiance. These represent, respectively, the scientific consensus and a hypothesis advanced by prominent skeptics. We conduct sensitivity analyses to determine how a variety of factors describing both the market and the physical climate may affect traders' beliefs about the cause of global climate change. Market participation causes most traders to converge quickly toward believing the "true" climate model, suggesting that a climate market could be useful for building public consensus.
This article outlines a method for automatically generating models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. This is useful for designing empirically grounded agent-based simulations and for gaining direct insight into observed dynamic processes. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple approximations that explain most of the structure of complex stochastic processes. This method, implemented in C++ and R, scales well to large data sets. We apply our methods to empirical data from human subjects game experiments and international relations. We also demonstrate the method's ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity.
Mar 28 2017 cs.MA
Norms have been extensively proposed as coordination mechanisms for both agent and human societies. Nevertheless, choosing the norms to regulate a society is by no means straightforward. The reasons are twofold. First, the norms to choose from may not be independent (i.e, they can be related to each other). Second, different preference criteria may be applied when choosing the norms to enact. This paper advances the state of the art by modeling a series of decision-making problems that regulation authorities confront when choosing the policies to establish. In order to do so, we first identify three different norm relationships -namely, generalisation, exclusivity, and substitutability- and we then consider norm representation power, cost, and associated moral values as alternative preference criteria. Thereafter, we show that the decision-making problems faced by policy makers can be encoded as linear programs, and hence solved with the aid of state-of-the-art solvers.
An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor agents observe a dynamic process and sporadically transmit their measurements to estimator agents over a shared bus network. Local event-triggering protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. The event-based design is shown to emulate the performance of a centralised state observer design up to guaranteed bounds, but with reduced communication. The stability results for state estimation are extended to the distributed control system that results when the local estimates are used for feedback control. Results from numerical simulations and hardware experiments illustrate the effectiveness of the proposed approach in reducing network communication.
This thesis is in the area called computational social choice which is an intersection area of algorithms and social choice theory.
Motivated by economic dispatch and linearly-constrained resource allocation problems, this paper proposes a novel Distributed Approx-Newton algorithm that approximates the standard Newton optimization method. A main property of this distributed algorithm is that it only requires agents to exchange constant-size communication messages. The convergence of this algorithm is discussed and rigorously analyzed. In addition, we aim to address the problem of designing communication topologies and weightings that are optimal for second-order methods. To this end, we propose an effective approximation which is loosely based on completing the square to address the NP-hard bilinear optimization involved in the design. Simulations demonstrate that our proposed weight design applied to the Distributed Approx-Newton algorithm has a superior convergence property compared to existing weighted and distributed first-order gradient descent methods.
We consider the problem of controlling the spatiotemporal probability distribution of a robotic swarm that evolves according to a reflected diffusion process, using the space- and time-dependent drift vector field parameter as the control variable. In contrast to previous work on control of the Fokker-Planck equation, a zero-flux boundary condition is imposed on the partial differential equation that governs the swarm probability distribution, and only bounded vector fields are considered to be admissible as control parameters. Under these constraints, we show that any initial probability distribution can be transported to a target probability distribution under certain assumptions on the regularity of the target distribution. In particular, we show that if the target distribution is (essentially) bounded, has bounded first-order and second-order partial derivatives, and is bounded from below by a strictly positive constant, then this distribution can be reached exactly using a drift vector field that is bounded in space and time. Our proof is constructive and based on classical linear semigroup theoretic concepts.
In this paper, we focus on applications in machine learning, optimization, and control that call for the resilient selection of a few elements, e.g. features, sensors, or leaders, against a number of adversarial denial-of-service attacks or failures. In general, such resilient optimization problems are hard, and cannot be solved exactly in polynomial time, even though they often involve objective functions that are monotone and submodular. Notwithstanding, in this paper we provide the first scalable, curvature-dependent algorithm for their approximate solution, that is valid for any number of attacks or failures, and which, for functions with low curvature, guarantees superior approximation performance. Notably, the curvature has been known to tighten approximations for several non-resilient maximization problems, yet its effect on resilient maximization had hitherto been unknown. We complement our theoretical analyses with supporting empirical evaluations.
Recently the dynamics of signed networks, where the ties among the agents can be both positive (attractive) or negative (repulsive) have attracted substantial attention of the research community. Examples of such networks are models of opinion dynamics over signed graphs, recently introduced by Altafini (2012,2013) and extended to discrete-time case by Meng et al. (2014). It has been shown that under mild connectivity assumptions these protocols provide the convergence of opinions in absolute value, whereas their signs may differ. This "modulus consensus" may correspond to the polarization of the opinions (or bipartite consensus, including the usual consensus as a special case), or their convergence to zero. In this paper, we demonstrate that the phenomenon of modulus consensus in the discrete-time Altafini model is a manifestation of a more general and profound fact, regarding the solutions of a special recurrent inequality. Although such a recurrent inequality does not provide the uniqueness of a solution, it can be shown that, under some natural assumptions, each of its bounded solutions has a limit and, moreover, converges to consensus. A similar property has previously been established for special continuous-time differential inequalities (Proskurnikov, Cao, 2016). Besides analysis of signed networks, we link the consensus properties of recurrent inequalities to the convergence analysis of distributed optimization algorithms and the problems of Schur stability of substochastic matrices.
In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. These are serious limitations since multicore processors are now ubiquitous, and DDM algorithms -- being CPU-intensive -- could benefit from additional computing power. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. Sort-Based Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize due to data dependencies. We describe the algorithm and compute its asymptotic running time; we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.
This paper focuses on a passivity-based distributed reference governor (RG) applied to a pre-stabilized mobile robotic network. The novelty of this paper lies in the method used to solve the RG problem, where a passivity-based distributed optimization scheme is proposed. In particular, the gradient descent method minimizes the global objective function while the dual ascent method maximizes the Hamiltonian. To make the agents converge to the agreed optimal solution, a proportional-integral consensus estimator is used. This paper proves the convergence of the state estimates of the RG to the optimal solution through passivity arguments, considering the physical system static. Then, the effectiveness of the scheme considering the dynamics of the physical system is demonstrated through simulations and experiments.