# Multiagent Systems (cs.MA)

• The development and deployment of Autonomous Vehicles (AVs) on our roads is not only realistic in the near future but can also bring significant benefits. In particular, it can potentially solve several problems relating to vehicles and traffic, for instance: (i) possible reduction of traffic congestion, with the consequence of improved fuel economy and reduced driver inactivity; (ii) possible reduction in the number of accidents, assuming that an AV can minimise the human errors that often cause traffic accidents; and (iii) increased ease of parking, especially when one considers the potential for shared AVs. In order to deploy an AV there are significant steps that must be completed in terms of hardware and software. As expected, software components play a key role in the complex AV system and so, at least for safety, we should assess the correctness of these components. In this paper, we are concerned with the high-level software component(s) responsible for the decisions in an AV. We intend to model an AV capable of navigation; obstacle avoidance; obstacle selection (when a crash is unavoidable) and vehicle recovery, etc, using a rational agent. To achieve this, we have established the following stages. First, the agent plans and actions have been implemented within the Gwendolen agent programming language. Second, we have built a simulated automotive environment in the Java language. Third, we have formally specified some of the required agent properties through LTL formulae, which are then formally verified with the AJPF verification tool. Finally, within the MCAPL framework (which comprises all the tools used in previous stages) we have obtained formal verification of our AV agent in terms of its specific behaviours. For example, the agent plans responsible for selecting an obstacle with low potential damage, instead of a higher damage obstacle (when possible) can be formally verified within MCAPL. We must emphasise that the major goal (of our present approach) lies in the formal verification of agent plans, rather than evaluating real-world applications. For this reason we utilised a simple matrix representation concerning the environment used by our agent.
• We discuss the connection between computational social choice (comsoc) and computational complexity. We stress the work so far on, and urge continued focus on, two less-recognized aspects of this connection. Firstly, this is very much a two-way street: Everyone knows complexity classification is used in comsoc, but we also highlight benefits to complexity that have arisen from its use in comsoc. Secondly, more subtle, less-known complexity tools often can be very productively used in comsoc.
• The goal of this paper is to present an end-to-end, data-driven framework to control Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first model the AMoD system using a time-expanded network, and present a formulation that computes the optimal rebalancing strategy (i.e., preemptive repositioning) and the minimum feasible fleet size for a given travel demand. Then, we adapt this formulation to devise a Model Predictive Control (MPC) algorithm that leverages short-term demand forecasts based on historical data to compute rebalancing strategies. We test the end-to-end performance of this controller with a state-of-the-art LSTM neural network to predict customer demand and real customer data from DiDi Chuxing: we show that this approach scales very well for large systems (indeed, the computational complexity of the MPC algorithm does not depend on the number of customers and of vehicles in the system) and outperforms state-of-the-art rebalancing strategies by reducing the mean customer wait time by up to to 89.6%.
• Decentralized control of robots has attracted huge research interests. However, some of the research used unrealistic assumptions without collision avoidance. This report focuses on the collision-free control for multiple robots in both complete coverage and search tasks in 2D and 3D areas which are arbitrary unknown. All algorithms are decentralized as robots have limited abilities and they are mathematically proved. The report starts with the grid selection in the two tasks. Grid patterns simplify the representation of the area and robots only need to move straightly between neighbor vertices. For the 100% complete 2D coverage, the equilateral triangular grid is proposed. For the complete coverage ignoring the boundary effect, the grid with the fewest vertices is calculated in every situation for both 2D and 3D areas. The second part is for the complete coverage in 2D and 3D areas. A decentralized collision-free algorithm with the above selected grid is presented driving robots to sections which are furthest from the reference point. The area can be static or expanding, and the algorithm is simulated in MATLAB. Thirdly, three grid-based decentralized random algorithms with collision avoidance are provided to search targets in 2D or 3D areas. The number of targets can be known or unknown. In the first algorithm, robots choose vacant neighbors randomly with priorities on unvisited ones while the second one adds the repulsive force to disperse robots if they are close. In the third algorithm, if surrounded by visited vertices, the robot will use the breadth-first search algorithm to go to one of the nearest unvisited vertices via the grid. The second search algorithm is verified on Pioneer 3-DX robots. The general way to generate the formula to estimate the search time is demonstrated. Algorithms are compared with five other algorithms in MATLAB to show their effectiveness.
• Due to the complexity of the natural world, a programmer cannot foresee all possible situations a connected and autonomous vehicle (CAV) will face during its operation, and hence, CAVs will need to learn to make decisions autonomously. Due to the sensing of its surroundings and information exchanged with other vehicles and road infrastructure a CAV will have access to large amounts of useful data. This paper investigates a data driven driving policy learning framework through an agent based learning. A reinforcement learning framework is presented in the paper, which simulates the self-evolution of a CAV over its lifetime. The results indicated that overtime the CAVs are able to learn useful policies to avoid crashes and achieve its objectives in more efficient ways. Vehicle to vehicle communication in particular, enables additional useful information to be acquired by CAVs, which in turn enables CAVs to learn driving policies more efficiently. The simulation results indicate that while a CAV can learn to make autonomous decision V2V communication of information improves this capability. The future work will investigate complex driving policies such as roundabout negotiations, cooperative learning between CAVs and deep reinforcement learning to traverse larger state spaces.
• In this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning, and scale the population size up to millions. Our results show that the population dynamics of AI agents, driven only by each agent's individual self interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.
• We introduce an axiomatic approach to group recommendations, in line of previous work on the axiomatic treatment of trust-based recommendation systems, ranking systems, and other foundational work on the axiomatic approach to internet mechanisms in social choice settings. In group recommendations we wish to recommend to a group of agents, consisting of both opinionated and undecided members, a joint choice that would be acceptable to them. Such a system has many applications, such as choosing a movie or a restaurant to go to with a group of friends, recommending games for online game players, & other communal activities. Our method utilizes a given social graph to extract information on the undecided, relying on the agents influencing them. We first show that a set of fairly natural desired requirements (a.k.a axioms) leads to an impossibility, rendering mutual satisfaction of them unreachable. However, we also show a modified set of axioms that fully axiomatize a group variant of the random-walk recommendation system, expanding a previous result from the individual recommendation case.
• This paper proposes a novel game-theoretical autonomous decision-making framework to address a task allocation problem for a swarm of multiple agents. We consider cooperation of self-interested agents and show that agents who have social inhibition can converge to a Nash stable partition (i.e., social agreement) using our proposed decentralised algorithm within polynomial time. The algorithm is simple and executable based on local interactions with neighbour agents under a strongly-connected communication network and even in asynchronous environments. We analytically present a mathematical formulation for computing the lower bound of a converged solution's suboptimality and additionally show that 50 % of suboptimality can be minimally guaranteed if social utilities are non-decreasing functions with respect to the number of co-working agents. Through numerical experiments, it is confirmed that the proposed framework is scalable, fast adaptable against dynamical environments, and robust even in a realistic situation where some of the agents temporarily somehow do not operate during a mission.
• Swarm robotic systems are currently being used to address many real-world problems. One interesting application of swarm robotics is the self-organized formation of structures and shapes. Some of the key challenges in the swarm robotic systems include swarm size constraint, random motion, coordination among robots, localization, and adaptability in a decentralized environment. Rubenstein et al. presented a system ("Programmable self-assembly in a thousand-robot swarm", Science, 2014) for thousand-robot swarm able to form only solid shapes with the robots in aggregated form by applying the collective behavior algorithm. Even though agent-based approaches have been presented in various studies for self-organized formation, however these studies lack agent-based modeling (ABM) approach along with the constraints in term of structure complexity and heterogeneity in large swarms with dynamic localization. The cognitive agent-based computing (CABC) approach is capable of modeling such self-organization based multi-agents systems (MAS). In this paper, we develop a simulation model using ABM under CABC approach for self-organized shape formation in swarm robots. We propose a shape formation algorithm for validating our model and perform simulation-based experiments for six different shapes including hole-based shapes. We also demonstrate the formal specification for our model. The simulation result shows the robustness of the proposed approach having the emergent behavior of robots for the self-organized shape formation. The performance of the proposed approach is evaluated by robots convergence rate.
• We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, single population, symmetric games. We reveal several surprising formal relationships between an asymmetric two-population game and its symmetric single population counterparts, which facilitate a convenient analysis of the original asymmetric game due to the dimensionality reduction of the decomposition. The main finding reveals that if (x,y) is a Nash equilibrium of an asymmetric game (A,B), this implies that y is a Nash equilibrium of the symmetric counterpart game determined by payoff table A, and x is a Nash equilibrium of the symmetric counterpart game determined by payoff table B. Also the reverse holds and combinations of Nash equilibria of the counterpart games form Nash equilibria of the asymmetric game. We illustrate how these formal relationships aid in identifying and analysing the Nash structure of asymmetric games, by examining the evolutionary dynamics of the simpler counterpart games in several canonical examples.
• Since their inception, Multi Agent Systems (MASs) have been championed as a solution for the increasing problem of software complexity. Communities of distributed autonomous computing entities that are capable of collaborating, negotiating and acting to solve complex organisational and system management problems are an attractive proposition. Central to this is the requirement for agents to possess the capability of interacting with one another in a structured, consistent and organised manner. This thesis presents the Agent Conversation Reasoning Engine (ACRE), which constitutes a holistic view of communication management for MASs. ACRE is intended to facilitate the practical development, debugging and deployment of communication-heavy MASs. ACRE has been formally defined in terms of its operational semantics, and a generic architecture has been proposed to facilitate its integration with a wide variety of diverse agent development frameworks and Agent Oriented Programming (AOP) languages. A concrete implementation has also been developed that uses the Agent Factory AOP framework as its base. This allows ACRE to be used with a number of different AOP languages, while providing a reference implementation that other integrations can be modelled upon. A standard is also proposed for the modelling and sharing of agent-focused interaction protocols that is independent of the platform within which a concrete ACRE implementation is run. Finally, a user evaluation illustrates the benefits of incorporating conversation management into agent programming.
• To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.
• For a group of cooperating UAVs, localizing each other is often a key task. This paper studies the localization problem for a group of UAVs flying in 3D space with very limited information, i.e., when noisy distance measurements are the only type of inter-agent sensing that is available, and when only one UAV knows a global coordinate basis, the others being GPS-denied. Initially for a two-agent problem, but easily generalized to some multi-agent problems, constraints are established on the minimum number of required distance measurements required to achieve the localization. The paper also proposes an algorithm based on semidefinite programming (SDP), followed by maximum likelihood estimation using a gradient descent initialized from the SDP calculation. The efficacy of the algorithm is verified with experimental noisy flight data.
• Nov 02 2017 cs.CC cs.GT cs.MA arXiv:1711.00201v1
We believe that economic design and computational complexity---while already important to each other---should become even more important to each other with each passing year. But for that to happen, experts in on the one hand such areas as social choice, economics, and political science and on the other hand computational complexity will have to better understand each other's worldviews. This article, written by two complexity theorists who also work in computational social choice theory, focuses on one direction of that process by presenting a brief overview of how most computational complexity theorists view the world. Although our immediate motivation is to make the lens through which complexity theorists see the world be better understood by those in the social sciences, we also feel that even within computer science it is very important for nontheoreticians to understand how theoreticians think, just as it is equally important within computer science for theoreticians to understand how nontheoreticians think.
• We propose a multiagent distributed actor-critic algorithm for multitask reinforcement learning (MRL), named \textitDiff-DAC. The agents are connected, forming a (possibly sparse) network. Each agent is assigned a task and has access to data from this local task only. During the learning process, the agents are able to communicate some parameters to their neighbors. Since the agents incorporate their neighbors' parameters into their own learning rules, the information is diffused across the network, and they can learn a common policy that generalizes well across all tasks. Diff-DAC is scalable since the computational complexity and communication overhead per agent grow with the number of neighbors, rather than with the total number of agents. Moreover, the algorithm is fully distributed in the sense that agents self-organize, with no need for coordinator node. Diff-DAC follows an actor-critic scheme where the value function and the policy are approximated with deep neural networks, being able to learn expressive policies from raw data. As a by-product of Diff-DAC's derivation from duality theory, we provide novel insights into the standard actor-critic framework, showing that it is actually an instance of the dual ascent method to approximate the solution of a linear program. Experiments illustrate the performance of the algorithm in the cart-pole, inverted pendulum, and swing-up cart-pole environments.
• When voting on a proposal one in fact chooses between two alternatives: (i) A new hypothetical social state depicted by the proposal and (ii) the status quo (henceforth: Reality); a Yes vote favors a transition to the proposed hypothetical state, while a No vote favors Reality. Social Choice theory generalizes voting on one proposal to ranking multiple proposed alternatives; we remorse that during this generalization, Reality was neglected. Here we propose to rectify this state of affairs by incorporating Reality into Social Choice. We do so by recognizing Reality as an ever present, always relevant, evolving social state, which is distinguished from hypothetical social states, and explore the ramifications of this recognition. We argue that incorporating Reality into Social Choice is natural, even essential, and show that doing so necessitates revisiting its foundation, as Arrow's theorem and the Condorcet voting paradox do not carry over. We explore the plethora of research directions opened by taking Reality into consideration: New models of Reality-aware Social Choice (we present three, from the most abstract to more concrete, with their associated axioms); new Reality-aware voting rules (we present voting rules that are simple to communicate and to implement); new concepts (we present democratic action plans, an extension of democratic decisions); and new game-theoretic questions related to strategic voting (we discuss one Reality-based game). Arrow's theorem was taken to show that democracy, conceived as government by the will of the people, is an incoherent illusion. As Reality-aware Social Choice renders Arrow's theorem vacuous and resolves the Condorcet voting paradox, it may clear this intellectual blemish on democracy; pave the way for a broad application of ranked voting according to the Condorcet criterion; and, more generally, may help restore trust in democracy.
• In the ICS, WUT a platform for simulation of cooperation of physical and virtual mobile agents is under development. The paper describes the motivation of the research, an organization of the platform, a model of agent, and the principles of design of the platform. Several experimental simulations are briefly described.
• A challenge in multiagent control systems is to ensure that they are appropriately resilient to communication failures between the various agents. In many common game-theoretic formulations of these types of systems, it is implicitly assumed that all agents have access to as much information about other agents' actions as needed. This paper endeavors to augment these game-theoretic methods with policies that would allow agents to react on-the-fly to losses of this information. Unfortunately, we show that even if a single agent loses communication with one other weakly-coupled agent, this can cause arbitrarily-bad system states to emerge as various solution concepts of an associated game, regardless of how the agent accounts for the communication failure and regardless of how weakly coupled the agents are. Nonetheless, we show that the harm that communication failures can cause is limited by the structure of the problem; when agents' action spaces are richer, problems are more susceptible to these types of pathologies. Finally, we undertake an initial study into how a system designer might prevent these pathologies, and explore a few limited settings in which communication failures cannot cause harm.
• Since the publication of 'Complex Contagions and the Weakness of Long Ties' in 2007, complex contagions have been studied across an enormous variety of social domains. In reviewing this decade of research, we discuss recent advancements in applied studies of complex contagions, particularly in the domains of health, innovation diffusion, social media, and politics. We also discuss how these empirical studies have spurred complementary advancements in the theoretical modeling of contagions, which concern the effects of network topology on diffusion, as well as the effects of individual-level attributes and thresholds. In synthesizing these developments, we suggest three main directions for future research. The first concerns the study of how multiple contagions interact within the same network and across networks, in what may be called an ecology of contagions. The second concerns the study of how the structure of thresholds and their behavioral consequences can vary by individual and social context. The third area concerns the roles of diversity and homophily in the dynamics of complex contagion, including both diversity of demographic profiles among local peers, and the broader notion of structural diversity within a network. Throughout this discussion, we make an effort to highlight the theoretical and empirical opportunities that lie ahead.
• A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware.
• This paper deals with a new type of warehousing system, Robotic Mobile Fulfillment Systems (RMFS). In such systems, robots are sent to carry storage units, so-called "pods", from the inventory and bring them to human operators working at stations. At the stations, the items are picked according to customers' orders. There exist new decision problems in such systems, for example, the reallocation of pods after their visits at work stations or the selection of pods to fulfill orders. In order to analyze decision strategies for these decision problems and relations between them, we develop a simulation framework called "RAWSim-O" in this paper. Moreover, we show a real-world application of our simulation framework by integrating simple robot prototypes based on vacuum cleaning robots.
• In multi-agent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and static obstacles, often without communication with each other. Existing methods compute motions that are optimal locally but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop methods to allow agents to dynamically adapt their behavior to their local conditions. We accomplish this by formulating the multi-agent navigation problem as an action-selection problem, and propose an approach, ALAN, that allows agents to compute time-efficient and collision-free motions. ALAN is highly scalable because each agent makes its own decisions on how to move using a set of velocities optimized for a variety of navigation tasks. Experimental results show that the agents using ALAN, in general, reach their destinations faster than using ORCA, a state-of-the-art collision avoidance framework, the Social Forces model for pedestrian navigation, and a Predictive collision avoidance model.
• We consider multi-agent stochastic optimization problems over reproducing kernel Hilbert spaces (RKHS). In this setting, a network of interconnected agents aims to learn decision functions, i.e., nonlinear statistical models, that are optimal in terms of a global convex functional that aggregates data across the network, with only access to locally and sequentially observed samples. We propose solving this problem by allowing each agent to learn a local regression function while enforcing consensus constraints. We use a penalized variant of functional stochastic gradient descent operating simultaneously with low-dimensional subspace projections. These subspaces are constructed greedily by applying orthogonal matching pursuit to the sequence of kernel dictionaries and weights. By tuning the projection-induced bias, we propose an algorithm that allows for each individual agent to learn, based upon its locally observed data stream and message passing with its neighbors only, a regression function that is close to the globally optimal regression function. That is, we establish that with constant step-size selections agents' functions converge to a neighborhood of the globally optimal one while satisfying the consensus constraints as the penalty parameter is increased. Moreover, the complexity of the learned regression functions is guaranteed to remain finite. On both multi-class kernel logistic regression and multi-class kernel support vector classification with data generated from class-dependent Gaussian mixture models, we observe stable function estimation and state of the art performance for distributed online multi-class classification. Experiments on the Brodatz textures further substantiate the empirical validity of this approach.
• Explanation of the hot topic "multi-agent path finding".
• The paper reports on some results concerning Aqvist's dyadic logic known as system G, which is one of the most influential logics for reasoning with dyadic obligations ("it ought to be the case that ... if it is the case that ..."). Although this logic has been known in the literature for a while, many of its properties still await in-depth consideration. In this short paper we show: that any formula in system G including nested modal operators is equivalent to some formula with no nesting; that the universal modality introduced by Aqvist in the first presentation of the system is definable in terms of the deontic modality.
• Trajectory interpolation, the process of filling-in the gaps and removing noise from observed agent trajectories, is an essential task for the motion inference in multi-agent setting. A desired trajectory interpolation method should be robust to noise, changes in environments or agent densities, while also being yielding realistic group movement behaviors. Such realistic behaviors are, however, challenging to model as they require avoidance of agent-agent or agent-environment collisions and, at the same time, seek computational efficiency. In this paper, we propose a novel framework composed of data-driven priors (local, global or combined) and an efficient optimization strategy for multi-agent trajectory interpolation. The data-driven priors implicitly encode the dependencies of movements of multiple agents and the collision-avoiding desiderata, enabling elimination of costly pairwise collision constraints and resulting in reduced computational complexity and often improved estimation. Various combinations of priors and optimization algorithms are evaluated in comprehensive simulated experiments. Our experimental results reveal important insights, including the significance of the global flow prior and the lesser-than-expected influence of data-driven collision priors.
• A smart grid can be considered as a complex network where each node represents a generation unit or a consumer. Whereas links can be used to represent transmission lines. One way to study complex systems is by using the agent-based modeling (ABM) paradigm. An ABM is a way of representing a complex system of autonomous agents interacting with each other. Previously, a number of studies have been presented in the smart grid domain making use of the ABM paradigm. However, to the best of our knowledge, none of these studies have focused on the specification aspect of ABM. An ABM specification is important not only for understanding but also for replication of the model. In this study, we focus on development as well as specification of ABM for smart grid. We propose an ABM by using a combination of agent-based and complex network-based approaches. For ABM specification, we use ODD and DREAM specification approaches. We analyze these two specification approaches qualitatively as well as quantitatively. Extensive experiments demonstrate that DREAM is a most useful approach as compared with ODD for modeling as well as for replication of models for smart grid.
• Within the area of multi-agent systems, normative systems are a widely used framework for the coordination of interdependent activities. A crucial problem associated with normative systems is that of synthesising norms that effectively accomplish a coordination task and whose compliance forms a rational choice for the agents within the system. In this work, we introduce a framework for the synthesis of normative systems that effectively coordinate a multi-agent system and whose norms are likely to be adopted by rational agents. Our approach roots in evolutionary game theory. Our framework considers multi-agent systems in which evolutionary forces lead successful norms to prosper and spread within the agent population, while unsuccessful norms are discarded. The outputs of this evolutionary norm synthesis process are normative systems whose compliance forms a rational choice for the agents. We empirically show the effectiveness of our approach through empirical evaluation in a simulated traffic domain.
• While reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. Using data collected from a pilot study of the spread of HIV awareness in social networks of homeless youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. Since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative Activation Jump Model (AJM) that describes information diffusion on physical networks from a multi-agent team perspective. Our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) The structural composition of a seed set team impacts each individual node's influencing behavior, and 2) an influencing node may spread information to non-neighbors. We show that the AJM significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. We then prove theoretical results, showing that the AJM exhibits many well-behaved properties shared by dominant models. Our results suggest that the AJM presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field.
• Developing a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generate its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts. We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to find an optimal policy which is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmaca
• This paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing. In the self-interested routing policy each agent selects a path that optimizes its own utility, while the system-optimum routing agents are assigned paths with the goal of maximizing system performance. This paper considers a scenario where a centralized network manager wishes to optimize utilities over all agents, i.e., implement a system-optimum routing policy. In many real-life scenarios, however, the system manager is unable to influence the route assignment of all agents due to limited influence on route choice decisions. Motivated by such scenarios, a computationally tractable method is presented that computes the minimal amount of agents that the system manager needs to influence (compliant agents) in order to achieve system optimal performance. Moreover, this methodology can also determine whether a given set of compliant agents is sufficient to achieve system optimum and compute the optimal route assignment for the compliant agents to do so. Experimental results are presented showing that in several large-scale, realistic traffic networks optimal flow can be achieved with as low as 13% of the agent being compliant and up to 54%.
• Monte Carlo Tree Search (MCTS) has been extended to many imperfect information games. However, due to the added complexity that uncertainty introduces, these adaptations have not reached the same level of practical success as their perfect information counterparts. In this paper we consider the development of agents that perform well against humans in imperfect information games with partially observable actions. We introduce the Semi-Determinized-MCTS (SDMCTS), a variant of the Information Set MCTS algorithm (ISMCTS). More specifically, SDMCTS generates a predictive model of the unobservable portion of the opponent's actions from historical behavioral data. Next, SDMCTS performs simulations on an instance of the game where the unobservable portion of the opponent's actions are determined. Thereby, it facilitates the use of the predictive model in order to decrease uncertainty. We present an implementation of the SDMCTS applied to the Cheat Game, a well-known card game, with partially observable (and often deceptive) actions. Results from experiments with 120 subjects playing a head-to-head Cheat Game against our SDMCTS agents suggest that SDMCTS performs well against humans, and its performance improves as the predictive model's accuracy increases.
• This paper provides an overview of evolutionary robotics techniques applied to on-line distributed evolution for robot collectives -- namely, embodied evolution. It provides a definition of embodied evolution as well as a thorough description of the underlying concepts and mechanisms. The paper also presents a comprehensive summary of research published in the field since its inception (1999-2017), providing various perspectives to identify the major trends. In particular, we identify a shift from considering embodied evolution as a parallel search method within small robot collectives (fewer than 10 robots) to embodied evolution as an on-line distributed learning method for designing collective behaviours in swarm-like collectives. The paper concludes with a discussion of applications and open questions, providing a milestone for past and an inspiration for future research.
• This work studies the labeled multi-robot path and motion planning problem in continuous domains, in the absence of static obstacles. Given $n$ robots which may be arbitrarily close to each other and assuming random start and goal configurations for the robots, we derived an $O(n^3)$, complete algorithm that produces solutions with constant-factor optimality guarantees on both makespan and distance optimality, in expectation. Furthermore, our algorithm only requires a small constant factor expansion of the initial and goal configuration footprints for solving the problem. In addition to strong theoretical guarantees, we present a thorough computational evaluation of the proposed solution. Beyond the baseline solution, adapting an effective (but non-polynomial time) robot routing subroutine, we also provide a highly efficient implementation that quickly computes near-optimal solutions. Hardware experiments on microMVP platform composed of non-holonomic robots confirms the practical applicability of our algorithmic pipeline.
• Existing socio-psychological studies suggest that users of a social network form their opinions relying on the opinions of their neighbors. According to DeGroot opinion formation model, one value of particular importance is the asymptotic consensus value---the sum of user opinions weighted by the users' eigenvector centralities. This value plays the role of an attractor for the opinions in the network and is a lucrative target for external influence. However, since any potentially malicious control of the opinion distribution in a social network is clearly undesirable, it is important to design methods to prevent the external attempts to strategically change the asymptotic consensus value. In this work, we assume that the adversary wants to maximize the asymptotic consensus value by altering the opinions of some users in a network; we, then, state DIVER---an NP-hard problem of disabling such external influence attempts by strategically adding a limited number of edges to the network. Relying on the theory of Markov chains, we provide perturbation analysis that shows how eigenvector centrality and, hence, DIVER's objective function change in response to an edge's addition to the network. The latter leads to the design of a pseudo-linear-time heuristic for DIVER, whose computation relies on efficient estimation of mean first passage times in a Markov chain. We confirm our theoretical findings in experiments.
• Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
• The introduction of autonomous vehicles (AVs) will have far-reaching effects on road traffic in cities and on highways.The implementation of automated highway system (AHS), possibly with a dedicated lane only for AVs, is believed to be a requirement to maximise the benefit from the advantages of AVs. We study the ramifications of an increasing percentage of AVs on the traffic system with and without the introduction of a dedicated AV lane on highways. We conduct an analytical evaluation of a simplified scenario and a macroscopic simulation of the city of Singapore under user equilibrium conditions with a realistic traffic demand. We present findings regarding average travel time, fuel consumption, throughput and road usage. Instead of only considering the highways, we also focus on the effects on the remaining road network. Our results show a reduction of average travel time and fuel consumption as a result of increasing the portion of AVs in the system. We show that the introduction of an AV lane is not beneficial in terms of average commute time. Examining the effects of the AV population only, however, the AV lane provides a considerable reduction of travel time (approx. 25%) at the price of delaying conventional vehicles (approx. 7%). Furthermore a notable shift of travel demand away from the highways towards major and small roads is noticed in early stages of AV penetration of the system. Finally, our findings show that after a certain threshold percentage of AVs the differences between AV and no AV lane scenarios become negligible.
• Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.
• The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies Generalized Advantage Estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios where the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely-separated events occur, adversely affecting the policies learned. Additionally, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.
• Inspired by biological swarms, robotic swarms are envisioned to solve real-world problems that are difficult for individual agents. Biological swarms can achieve collective intelligence based on local interactions and simple rules; however, designing effective distributed policies for large-scale robotic swarms to achieve a global objective can be challenging. Although it is often possible to design an optimal centralized strategy for smaller numbers of agents, those methods can fail as the number of agents increases. Motivated by the growing success of machine learning, we develop a deep learning approach that learns distributed coordination policies from centralized policies. In contrast to traditional distributed control approaches, which are usually based on human-designed policies for relatively simple tasks, this learning-based approach can be adapted to more difficult tasks. We demonstrate the efficacy of our proposed approach on two different tasks, the well-known rendezvous problem and a more difficult particle assignment problem. For the latter, no known distributed policy exists. From extensive simulations, it is shown that the performance of the learned coordination policies is comparable to the centralized policies, surpassing state-of-the-art distributed policies. Thereby, our proposed approach provides a promising alternative for real-world coordination problems that would be otherwise computationally expensive to solve or intangible to explore.
• In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.
• Cooperative motion planning is still a challenging task for robots. Recently, Value Iteration Networks (VINs) were proposed to model motion planning tasks as Neural Networks. In this work, we extend VINs to solve cooperative planning tasks under non-holonomic constraints. For this, we interconnect multiple VINs to pay respect to each other's outputs. Policies for cooperation are generated via iterative gradient descend. Validation in simulation shows that the resulting networks can resolve non-holonomic motion planning problems that require cooperation.
• A generic architecture for a class of distributed robotic systems is presented. The architecture supports openness and heterogeneity, i.e. heterogeneous components may be joined and removed from the systems without affecting its basic functionality. The architecture is based on the paradigm of Service Oriented Architecture (SOA), and a generic representation (ontology) of the environment. A device (e.g. robot) is seen as a collection of its capabilities exposed as services. Generic protocols for publishing, discovering, arranging services are proposed for creating composite services that can accomplish complex tasks in an automatic way. Also generic protocols for execution of composite services are proposed along with simple protocols for monitoring the executions, and for recovery from failures. A software platform built on a multi-robot system (according to the proposed architecture) is a multi-agent system.
• We address a problem of area protection in graph-based scenarios with multiple mobile agents where connectivity is maintained among agents to ensure they can communicate. The problem consists of two adversarial teams of agents that move in an undirected graph shared by both teams. Agents are placed in vertices of the graph; at most one agent can occupy a vertex; and they can move into adjacent vertices in a conflict free way. Teams have asymmetric goals: the aim of one team - attackers - is to invade into given area while the aim of the opponent team - defenders - is to protect the area from being entered by attackers by occupying selected vertices. The team of defenders need to maintain connectivity of vertices occupied by its own agents in a visibility graph. The visibility graph models possibility of communication between pairs of vertices. We study strategies for allocating vertices to be occupied by the team of defenders to block attacking agents where connectivity is maintained at the same time. To do this we reserve a subset of defending agents that do not try to block the attackers but instead are placed to support connectivity of the team. The performance of strategies is tested in multiple benchmarks. The success of a strategy is heavily dependent on the type of the instance, and so one of the contributions of this work is that we identify suitable strategies for diverse instance types.
• This paper presents a model-free reinforcement learning (RL) based distributed control protocol for leader-follower multi-agent systems. Although RL has been successfully used to learn optimal control protocols for multi-agent systems, the effects of adversarial inputs are ignored. It is shown in this paper, however, that their adverse effects can propagate across the network and impact the learning outcome of other intact agents. To alleviate this problem, a unified RL-based distributed control frameworks is developed for both homogeneous and heterogeneous multi-agent systems to prevent corrupted sensory data from propagating across the network. To this end, only the leader communicates its actual sensory information and other agents estimate the leader state using a distributed observer and communicate this estimation to their neighbors to achieve consensus on the leader state. The observer cannot be physically affected by any adversarial input. To further improve resiliency, distributed H-infinity control protocols are designed to attenuate the effect of the adversarial inputs on the compromised agent itself. An off-policy RL algorithm is developed to learn the solutions of the game algebraic Riccati equations arising from solving the H-infinity control problem. No knowledge of the agent dynamics is required and it is shown that the proposed RL-based H-infinity control protocol is resilient against adversarial inputs.
• We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform. When a buyer searches for a keyword, the website presents the buyer with a list of different sellers for this item, together with the corresponding prices. This can be seen as an instance of a resource allocation problem in which the sellers choose their prices at each step and the platform decides how to allocate the impressions, based on the chosen prices and the historical transactions of each seller. Due to the complexity of the system, most e-commerce platforms employ heuristic allocation algorithms that mainly depend on the sellers' transaction records and without taking the rationality of the sellers into account, which makes them susceptible to several price manipulations. In this paper, we put forward a general framework of designing impression allocation algorithms in e-commerce websites given any behavioural model for the sellers, using deep reinforcement learning. The impression allocation problem is modeled as a Markov decision process, where the states encode the history of impressions, prices, transactions and generated revenue and the actions are the possible impression allocations at each round. To tackle the problem of continuity and high-dimensionality of states and actions, we adopt the ideas of the DDPG algorithm to design an actor-critic gradient policy algorithm which takes advantage of the problem domain in order to achieve covergence and stability. Our algorithm is compared against natural heuristics and it outperforms all of them in terms of the total revenue generated. Finally, contrary to the DDPG algorithm, our algorithm is robust to settings with variable sellers and easy to converge.