# Multiagent Systems (cs.MA)

• Robust environment perception is essential for decision-making on robots operating in complex domains. Intelligent task execution requires principled treatment of uncertainty sources in a robot's observation model. This is important not only for low-level observations (e.g., accelerometer data), but also for high-level observations such as semantic object labels. This paper formalizes the concept of macro-observations in Decentralized Partially Observable Semi-Markov Decision Processes (Dec-POSMDPs), allowing scalable semantic-level multi-robot decision making. A hierarchical Bayesian approach is used to model noise statistics of low-level classifier outputs, while simultaneously allowing sharing of domain noise characteristics between classes. Classification accuracy of the proposed macro-observation scheme, called Hierarchical Bayesian Noise Inference (HBNI), is shown to exceed existing methods. The macro-observation scheme is then integrated into a Dec-POSMDP planner, with hardware experiments running onboard a team of dynamic quadrotors in a challenging domain where noise-agnostic filtering fails. To the best of our knowledge, this is the first demonstration of a real-time, convolutional neural net-based classification framework running fully onboard a team of quadrotors in a multi-robot decision-making domain.
• Voting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Specifically, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisfied by the voting rule.
• Deep neural networks coupled with fast simulation and improved computation have led to recent successes in the field of reinforcement learning (RL). However, most current RL-based approaches fail to generalize since: (a) the gap between simulation and real world is so large that policy-learning approaches fail to transfer; (b) even if policy learning is done in real world, the data scarcity leads to failed generalization from training to test scenarios (e.g., due to different friction or object masses). Inspired from H-infinity control methods, we note that both modeling errors and differences in training and test scenarios can be viewed as extra forces/disturbances in the system. This paper proposes the idea of robust adversarial reinforcement learning (RARL), where we train an agent to operate in the presence of a destabilizing adversary that applies disturbance forces to the system. The jointly trained adversary is reinforced -- that is, it learns an optimal destabilization policy. We formulate the policy learning as a zero-sum, minimax objective function. Extensive experiments in multiple environments (InvertedPendulum, HalfCheetah, Swimmer, Hopper and Walker2d) conclusively demonstrate that our method (a) improves training stability; (b) is robust to differences in training/test conditions; and c) outperform the baseline even in the absence of the adversary.
• Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A key stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep RL relies. This paper proposes two methods that address this problem: 1) conditioning each agent's value function on a footprint that disambiguates the age of the data sampled from the replay memory and 2) using a multi-agent variant of importance sampling to naturally decay obsolete data. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
• Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and keeping the rest at optimum capacity. We analyse how abstract grouping can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment.
• Successful analysis of player skills in video games has important impacts on the process of enhancing player experience without undermining their continuous skill development. Moreover, player skill analysis becomes more intriguing in team-based video games because such form of study can help discover useful factors in effective team formation. In this paper, we consider the problem of skill decomposition in MOBA (MultiPlayer Online Battle Arena) games, with the goal to understand what player skill factors are essential for the outcome of a game match. To understand the construct of MOBA player skills, we utilize various skill-based predictive models to decompose player skills into interpretative parts, the impact of which are assessed in statistical terms. We apply this analysis approach on two widely known MOBAs, namely League of Legends (LoL) and Defense of the Ancients 2 (DOTA2). The finding is that base skills of in-game avatars, base skills of players, and players' champion-specific skills are three prominent skill components influencing LoL's match outcomes, while those of DOTA2 are mainly impacted by in-game avatars' base skills but not much by the other two.
• This paper addresses tracking of a moving target in a multi-agent network. The target follows a linear dynamics corrupted by an adversarial noise, i.e., the noise is not generated from a statistical distribution. The location of the target at each time induces a global time-varying loss function, and the global loss is a sum of local losses, each of which is associated to one agent. Agents noisy observations could be nonlinear. We formulate this problem as a distributed online optimization where agents communicate with each other to track the minimizer of the global loss. We then propose a decentralized version of the Mirror Descent algorithm and provide the non-asymptotic analysis of the problem. Using the notion of dynamic regret, we measure the performance of our algorithm versus its offline counterpart in the centralized setting. We prove that the bound on dynamic regret scales inversely in the network spectral gap, and it represents the adversarial noise causing deviation with respect to the linear dynamics. Our result subsumes a number of results in the distributed optimization literature. Finally, in a numerical experiment, we verify that our algorithm can be simply implemented for multi-agent tracking with nonlinear observations.
• This thesis is in the area called computational social choice which is an intersection area of algorithms and social choice theory.
• Motivated by economic dispatch and linearly-constrained resource allocation problems, this paper proposes a novel Distributed Approx-Newton algorithm that approximates the standard Newton optimization method. A main property of this distributed algorithm is that it only requires agents to exchange constant-size communication messages. The convergence of this algorithm is discussed and rigorously analyzed. In addition, we aim to address the problem of designing communication topologies and weightings that are optimal for second-order methods. To this end, we propose an effective approximation which is loosely based on completing the square to address the NP-hard bilinear optimization involved in the design. Simulations demonstrate that our proposed weight design applied to the Distributed Approx-Newton algorithm has a superior convergence property compared to existing weighted and distributed first-order gradient descent methods.
• We consider the problem of controlling the spatiotemporal probability distribution of a robotic swarm that evolves according to a reflected diffusion process, using the space- and time-dependent drift vector field parameter as the control variable. In contrast to previous work on control of the Fokker-Planck equation, a zero-flux boundary condition is imposed on the partial differential equation that governs the swarm probability distribution, and only bounded vector fields are considered to be admissible as control parameters. Under these constraints, we show that any initial probability distribution can be transported to a target probability distribution under certain assumptions on the regularity of the target distribution. In particular, we show that if the target distribution is (essentially) bounded, has bounded first-order and second-order partial derivatives, and is bounded from below by a strictly positive constant, then this distribution can be reached exactly using a drift vector field that is bounded in space and time. Our proof is constructive and based on classical linear semigroup theoretic concepts.
• In this paper, we focus on applications in machine learning, optimization, and control that call for the resilient selection of a few elements, e.g. features, sensors, or leaders, against a number of adversarial denial-of-service attacks or failures. In general, such resilient optimization problems are hard, and cannot be solved exactly in polynomial time, even though they often involve objective functions that are monotone and submodular. Notwithstanding, in this paper we provide the first scalable, curvature-dependent algorithm for their approximate solution, that is valid for any number of attacks or failures, and which, for functions with low curvature, guarantees superior approximation performance. Notably, the curvature has been known to tighten approximations for several non-resilient maximization problems, yet its effect on resilient maximization had hitherto been unknown. We complement our theoretical analyses with supporting empirical evaluations.
• Recently the dynamics of signed networks, where the ties among the agents can be both positive (attractive) or negative (repulsive) have attracted substantial attention of the research community. Examples of such networks are models of opinion dynamics over signed graphs, recently introduced by Altafini (2012,2013) and extended to discrete-time case by Meng et al. (2014). It has been shown that under mild connectivity assumptions these protocols provide the convergence of opinions in absolute value, whereas their signs may differ. This "modulus consensus" may correspond to the polarization of the opinions (or bipartite consensus, including the usual consensus as a special case), or their convergence to zero. In this paper, we demonstrate that the phenomenon of modulus consensus in the discrete-time Altafini model is a manifestation of a more general and profound fact, regarding the solutions of a special recurrent inequality. Although such a recurrent inequality does not provide the uniqueness of a solution, it can be shown that, under some natural assumptions, each of its bounded solutions has a limit and, moreover, converges to consensus. A similar property has previously been established for special continuous-time differential inequalities (Proskurnikov, Cao, 2016). Besides analysis of signed networks, we link the consensus properties of recurrent inequalities to the convergence analysis of distributed optimization algorithms and the problems of Schur stability of substochastic matrices.
• In this paper we consider the problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles. This is a common problem that arises in many agent-based simulation studies, and is of central importance in the context of High Level Architecture (HLA), where it is at the core of the Data Distribution Management (DDM) service. Several realizations of the DDM service have been proposed; however, many of them are either inefficient or inherently sequential. These are serious limitations since multicore processors are now ubiquitous, and DDM algorithms -- being CPU-intensive -- could benefit from additional computing power. We propose a parallel version of the Sort-Based Matching algorithm for shared-memory multiprocessors. Sort-Based Matching is one of the most efficient serial algorithms for the DDM problem, but is quite difficult to parallelize due to data dependencies. We describe the algorithm and compute its asymptotic running time; we complete the analysis by assessing its performance and scalability through extensive experiments on two commodity multicore systems based on a dual socket Intel Xeon processor, and a single socket Intel Core i7 processor.
• This paper focuses on a passivity-based distributed reference governor (RG) applied to a pre-stabilized mobile robotic network. The novelty of this paper lies in the method used to solve the RG problem, where a passivity-based distributed optimization scheme is proposed. In particular, the gradient descent method minimizes the global objective function while the dual ascent method maximizes the Hamiltonian. To make the agents converge to the agreed optimal solution, a proportional-integral consensus estimator is used. This paper proves the convergence of the state estimates of the RG to the optimal solution through passivity arguments, considering the physical system static. Then, the effectiveness of the scheme considering the dynamics of the physical system is demonstrated through simulations and experiments.
• This paper presents the first ever approach for solving \emphcontinuous-observation Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) and their semi-Markovian counterparts, Dec-POSMDPs. This contribution is especially important in robotics, where a vast number of sensors provide continuous observation data. A continuous-observation policy representation is introduced using Stochastic Kernel-based Finite State Automata (SK-FSAs). An SK-FSA search algorithm titled Entropy-based Policy Search using Continuous Kernel Observations (EPSCKO) is introduced and applied to the first ever continuous-observation Dec-POMDP/Dec-POSMDP domain, where it significantly outperforms state-of-the-art discrete approaches. This methodology is equally applicable to Dec-POMDPs and Dec-POSMDPs, though the empirical analysis presented focuses on Dec-POSMDPs due to their higher scalability. To improve convergence, an entropy injection policy search acceleration approach for both continuous and discrete observation cases is also developed and shown to improve convergence rates without degrading policy quality.
• We consider Arrovian aggregation of preferences over lotteries that are represented by skew-symmetric bilinear (SSB) utility functions, a significant generalization of von Neumann-Morgenstern utility functions due to Fishburn, in which utility is assigned to pairs of alternatives. We show that the largest domain of preferences that simultaneously allows for independence of irrelevant alternatives and Pareto optimality when comparing lotteries based on accumulated SSB welfare is a domain in which preferences over lotteries are completely determined by ordinal preferences over pure alternatives. In particular, a lottery is preferred to another lottery if and only if the former is more likely to return a preferred alternative. Preferences over pure alternatives are unrestricted. We argue that SSB welfare maximization for this domain constitutes an appealing probabilistic social choice function.
• The model presented in this paper experiments with a comprehensive simulant agent in order to provide an exploratory platform in which simulation modelers may try alternative scenarios and participation in policy decision-making. The framework is built in a computationally distributed online format in which users can join in and visually explore the results. Modeled activity involves daily routine errands, such as shopping, visiting the doctor or engaging in the labor market. Further, agents make everyday decisions based on individual behavioral attributes and minimal requirements, according to social and contagion networks. Fully developed firms and governments are also included in the model allowing for taxes collection, production decisions, bankruptcy and change in ownership. The contributions to the literature are multifold. They include (a) a comprehensive model with detailing of the agents and firms' activities and processes and original use of simultaneously (b) reinforcement learning for firm pricing and demand allocation; (c) social contagion for disease spreading and social network for hiring opportunities; and (d) Bayesian networks for demographic-like generation of agents. All of that within a (e) visually rich environment and multiple use of databases. Hence, the model provides a comprehensive framework from where interactions among citizens, firms and governments can be easily explored allowing for learning and visualization of policies and scenarios.
• A distributed algorithm is described for finding a common fixed point of a family of m>1 nonlinear maps M_i : R^n -> R^n assuming that each map is a paracontraction and that at least one such common fixed point exists. The common fixed point is simultaneously computed by m agents assuming each agent i knows only M_i, the current estimates of the fixed point generated by its neighbors, and nothing more. Each agent recursively updates its estimate of a fixed point by utilizing the current estimates generated by each of its neighbors. Neighbor relations are characterized by a time-varying directed graph N(t). It is shown under suitably general conditions on N(t), that the algorithm causes all agents estimates to converge to the same common fixed point of the m nonlinear maps.
• Although various norms for reciprocity-based cooperation have been suggested that are evolutionarily stable against invasion from free riders, the process of alternation of norms and the role of diversified norms remain unclear in the evolution of cooperation. We clarify the co-evolutionary dynamics of norms and cooperation in indirect reciprocity and also identify the indispensable norms for the evolution of cooperation. Inspired by the gene knockout method, a genetic engineering technique, we developed the norm knockout method and clarified the norms necessary for the establishment of cooperation. The results of numerical investigations revealed that the majority of norms gradually transitioned to tolerant norms after defectors are eliminated by strict norms. Furthermore, no cooperation emerges when specific norms that are intolerant to defectors are knocked out.
• This paper revisits the problem of multi-agent consensus from a graph signal processing perspective. By defining the graph filter from the consensus protocol, we establish the direct relation between average consensus of multi-agent systems and filtering of graph signals. This relation not only provides new insights of the average consensus, it also turns out to be a powerful tool to design effective consensus protocols for uncertain networks, which is difficult to deal with by existing time-domain methods. In this paper, we consider two cases, one is uncertain networks modeled by an estimated Laplacian matrix and a fixed eigenvalue bound, the other is connected graphs with unknown topology. The consensus protocols are designed for both cases based on the protocol filter. Several numerical examples are given to demonstrate the effectiveness of our methods.
• The integration of multiple viewpoints became an increasingly popular approach to deal with agent-based simulations. Despite their disparities, recent approaches successfully manage to run such multi-level simulations. Yet, are they doing it appropriately? This paper tries to answer that question, with an analysis based on a generic model of the temporal dynamics of multi-level simulations. This generic model is then used to build an orthogonal approach to multi-level simulation called SIMILAR. In this approach, most time-related issues are explicitly modeled, owing to an implementation-oriented approach based on the influence/reaction principle.
• The problem of achieving common understanding between agents that use different vocabularies has been mainly addressed by designing techniques that explicitly negotiate mappings between their vocabularies, requiring agents to share a meta-language. In this paper we consider the case of agents that use different vocabularies and have no meta-language in common, but share the knowledge of how to perform a task, given by the specification of an interaction protocol. For this situation, we present a framework that lets agents learn a vocabulary alignment from the experience of interacting. Unlike previous work in this direction, we use open protocols that constrain possible actions instead of defining procedures, making our approach more general. We present two techniques that can be used either to learn an alignment from scratch or to repair an existent one, and we evaluate experimentally their performance.
• Epistemic planning can be used for decision making in multi-agent situations with distributed knowledge and capabilities. Recently, Dynamic Epistemic Logic (DEL) has been shown to provide a very natural and expressive framework for epistemic planning. We extend the DEL-based epistemic planning framework to include perspective shifts, allowing us to define new notions of sequential and conditional planning with implicit coordination. With these, it is possible to solve planning tasks with joint goals in a decentralized manner without the agents having to negotiate about and commit to a joint policy at plan time. First we define the central planning notions and sketch the implementation of a planning system built on those notions. Afterwards we provide some case studies in order to evaluate the planner empirically and to show that the concept is useful for multi-agent systems in practice.
• Epistemic planning can be used for decision making in multi-agent situations with distributed knowledge and capabilities. Dynamic Epistemic Logic (DEL) has been shown to provide a very natural and expressive framework for epistemic planning. In this paper, we aim to give an accessible introduction to DEL-based epistemic planning. The paper starts with the most classical framework for planning, STRIPS, and then moves towards epistemic planning in a number of smaller steps, where each step is motivated by the need to be able to model more complex planning scenarios.
• One of the key challenges for multi-agent learning is scalability. In this paper, we introduce a technique for speeding up multi-agent learning by exploiting concurrent and incremental experience sharing. This solution adaptively identifies opportunities to transfer experiences between agents and allows for the rapid acquisition of appropriate policies in large-scale, stochastic, homogeneous multi-agent systems. We introduce an online, distributed, supervisor-directed transfer technique for constructing high-level characterizations of an agent's dynamic learning environment---called contexts---which are used to identify groups of agents operating under approximately similar dynamics within a short temporal window. A set of supervisory agents computes contextual information for groups of subordinate agents, thereby identifying candidates for experience sharing. Our method uses a tiered architecture to propagate, with low communication overhead, state, action, and reward data amongst the members of each dynamically-identified information-sharing group. We applied this method to a large-scale distributed task allocation problem with hundreds of information-sharing agents operating in an unknown, non-stationary environment. We demonstrate that our approach results in significant performance gains, that it is robust to noise-corrupted or suboptimal context features, and that communication costs scale linearly with the supervisor-to-subordinate ratio.
• This paper studies power allocation for distributed estimation of an unknown scalar random source in sensor networks with a multiple-antenna fusion center (FC), where wireless sensors are equipped with radio-frequency based energy harvesting technology. The sensors' observation is locally processed by using an uncoded amplify-and-forward scheme. The processed signals are then sent to the FC, and are coherently combined at the FC, at which the best linear unbiased estimator (BLUE) is adopted for reliable estimation. We aim to solve the following two power allocation problems: 1) minimizing distortion under various power constraints; and 2) minimizing total transmit power under distortion constraints, where the distortion is measured in terms of mean-squared error of the BLUE. Two iterative algorithms are developed to solve the non-convex problems, which converge at least to a local optimum. In particular, the above algorithms are designed to jointly optimize the amplification coefficients, energy beamforming, and receive filtering. For each problem, a suboptimal design, a single-antenna FC scenario, and a common harvester deployment for colocated sensors, are also studied. Using the powerful semidefinite relaxation framework, our result is shown to be valid for any number of sensors, each with different noise power, and for an arbitrarily number of antennas at the FC.
• One-sided matching mechanisms are fundamental for assigning a set of indivisible objects to a set of self-interested agents when monetary transfers are not allowed. Two widely-studied randomized mechanisms in multiagent settings are the Random Serial Dictatorship (RSD) and the Probabilistic Serial Rule (PS). Both mechanisms require only that agents specify ordinal preferences and have a number of desirable economic and computational properties. However, the induced outcomes of the mechanisms are often incomparable and thus there are challenges when it comes to deciding which mechanism to adopt in practice. In this paper, we first consider the space of general ordinal preferences and provide empirical results on the (in)comparability of RSD and PS. We analyze their respective economic properties under general and lexicographic preferences. We then instantiate utility functions with the goal of gaining insights on the manipulability, efficiency, and envyfreeness of the mechanisms under different risk-attitude models. Our results hold under various preference distribution models, which further confirm the broad use of RSD in most practical applications.
• We consider elections where the voters come one at a time, in a streaming fashion, and devise space-efficient algorithms which identify an approximate winning committee with respect to common multiwinner proportional representation voting rules; specifically, we consider the Approval-based and the Borda-based variants of both the Chamberlin-- ourant rule and the Monroe rule. We complement our algorithms with lower bounds. Somewhat surprisingly, our results imply that, using space which does not depend on the number of voters it is possible to efficiently identify an approximate representative committee of fixed size over vote streams with huge number of voters.
• Multi-agents systems communication is a technology, which provides a way for multiple interacting intelligent agents to communicate with each other and with environment. Multiple-agent systems are used to solve problems that are difficult for solving by individual agent. Multiple-agent communication technologies can be used for management and organization of computing fog and act as a global, distributed operating system. In present publication we suggest technology, which combines decentralized P2P BOINC general-purpose computing tasks distribution, multiple-agents communication protocol and smart-contract based rewards, powered by Ethereum blockchain. Such system can be used as distributed P2P computing power market, protected from any central authority. Such decentralized market can further be updated to system, which learns the most efficient way for software-hardware combinations usage and optimization. Once system learns to optimize software-hardware efficiency it can be updated to general-purpose distributed intelligence, which acts as combination of single-purpose AI.
• Many societal decision problems lie in high-dimensional continuous spaces not amenable to the voting techniques common for their discrete or single-dimensional counterparts. These problems are typically discretized before running an election or decided upon through negotiation by representatives. We propose a meta-algorithm called \emphIterative Local Voting for collective decision-making in this setting, in which voters are sequentially sampled and asked to modify a candidate solution within some local neighborhood of its current value, as defined by a ball in some chosen norm. In general, such schemes do not converge, or, when they do, the resulting solution does not have a natural description. We first prove the convergence of this algorithm under appropriate choices of neighborhoods to plausible solutions in certain natural settings: when the voters' utilities can be expressed in terms of some form of distance from their ideal solution, and when these utilities are additively decomposable across dimensions. In many of these cases, we obtain convergence to the societal welfare maximizing solution. We then describe an experiment in which we test our algorithm for the decision of the U.S. Federal Budget on Mechanical Turk with over 4,000 workers, employing neighborhoods defined by $\mathcal{L}^1, \mathcal{L}^2$ and $\mathcal{L}^\infty$ balls. We make several observations that inform future implementations of such a procedure.
• In this paper a decentralized control algorithm for systems composed of $N$ dynamically decoupled agents, coupled by feasibility constraints, is presented. The control problem is divided into $N$ optimal control sub-problems and a communication scheme is proposed to decouple computations. The derivative of the solution of each sub-problem is used to approximate the evolution of the system allowing the algorithm to decentralize and parallelize computations. The effectiveness of the proposed algorithm is shown through simulations in a cooperative driving scenario.
• We propose distributed online open loop planning (DOOLP), a general framework for online multiagent coordination and decision making under uncertainty. DOOLP is based on online heuristic search in the space defined by a generative model of the domain dynamics, which is exploited by agents to simulate and evaluate the consequences of their potential choices. We also propose distributed online Thompson sampling (DOTS) as an effective instantiation of the DOOLP framework. DOTS models sequences of agent choices by concatenating a number of multiarmed bandits for each agent and uses Thompson sampling for dealing with action value uncertainty. The Bayesian approach underlying Thompson sampling allows to effectively model and estimate uncertainty about (a) own action values and (b) other agents' behavior. This approach yields a principled and statistically sound solution to the exploration-exploitation dilemma when exploring large search spaces with limited resources. We implemented DOTS in a smart factory case study with positive empirical results. We observed effective, robust and scalable planning and coordination capabilities even when only searching a fraction of the potential search space.