- The goal of this paper is to present an end-to-end, data-driven framework to control Autonomous Mobility-on-Demand systems (AMoD, i.e. fleets of self-driving vehicles). We first model the AMoD system using a time-expanded network, and present a formulation that computes the optimal rebalancing strategy (i.e., preemptive repositioning) and the minimum feasible fleet size for a given travel demand. Then, we adapt this formulation to devise a Model Predictive Control (MPC) algorithm that leverages short-term demand forecasts based on historical data to compute rebalancing strategies. We test the end-to-end performance of this controller with a state-of-the-art LSTM neural network to predict customer demand and real customer data from DiDi Chuxing: we show that this approach scales very well for large systems (indeed, the computational complexity of the MPC algorithm does not depend on the number of customers and of vehicles in the system) and outperforms state-of-the-art rebalancing strategies by reducing the mean customer wait time by up to to 89.6%.
- Decentralized control of robots has attracted huge research interests. However, some of the research used unrealistic assumptions without collision avoidance. This report focuses on the collision-free control for multiple robots in both complete coverage and search tasks in 2D and 3D areas which are arbitrary unknown. All algorithms are decentralized as robots have limited abilities and they are mathematically proved. The report starts with the grid selection in the two tasks. Grid patterns simplify the representation of the area and robots only need to move straightly between neighbor vertices. For the 100% complete 2D coverage, the equilateral triangular grid is proposed. For the complete coverage ignoring the boundary effect, the grid with the fewest vertices is calculated in every situation for both 2D and 3D areas. The second part is for the complete coverage in 2D and 3D areas. A decentralized collision-free algorithm with the above selected grid is presented driving robots to sections which are furthest from the reference point. The area can be static or expanding, and the algorithm is simulated in MATLAB. Thirdly, three grid-based decentralized random algorithms with collision avoidance are provided to search targets in 2D or 3D areas. The number of targets can be known or unknown. In the first algorithm, robots choose vacant neighbors randomly with priorities on unvisited ones while the second one adds the repulsive force to disperse robots if they are close. In the third algorithm, if surrounded by visited vertices, the robot will use the breadth-first search algorithm to go to one of the nearest unvisited vertices via the grid. The second search algorithm is verified on Pioneer 3-DX robots. The general way to generate the formula to estimate the search time is demonstrated. Algorithms are compared with five other algorithms in MATLAB to show their effectiveness.
- This paper deals with a new type of warehousing system, Robotic Mobile Fulfillment Systems (RMFS). In such systems, robots are sent to carry storage units, so-called "pods", from the inventory and bring them to human operators working at stations. At the stations, the items are picked according to customers' orders. There exist new decision problems in such systems, for example, the reallocation of pods after their visits at work stations or the selection of pods to fulfill orders. In order to analyze decision strategies for these decision problems and relations between them, we develop a simulation framework called "RAWSim-O" in this paper. Moreover, we show a real-world application of our simulation framework by integrating simple robot prototypes based on vacuum cleaning robots.
- In multi-agent navigation, agents need to move towards their goal locations while avoiding collisions with other agents and static obstacles, often without communication with each other. Existing methods compute motions that are optimal locally but do not account for the aggregated motions of all agents, producing inefficient global behavior especially when agents move in a crowded space. In this work, we develop methods to allow agents to dynamically adapt their behavior to their local conditions. We accomplish this by formulating the multi-agent navigation problem as an action-selection problem, and propose an approach, ALAN, that allows agents to compute time-efficient and collision-free motions. ALAN is highly scalable because each agent makes its own decisions on how to move using a set of velocities optimized for a variety of navigation tasks. Experimental results show that the agents using ALAN, in general, reach their destinations faster than using ORCA, a state-of-the-art collision avoidance framework, the Social Forces model for pedestrian navigation, and a Predictive collision avoidance model.
- We consider multi-agent stochastic optimization problems over reproducing kernel Hilbert spaces (RKHS). In this setting, a network of interconnected agents aims to learn decision functions, i.e., nonlinear statistical models, that are optimal in terms of a global convex functional that aggregates data across the network, with only access to locally and sequentially observed samples. We propose solving this problem by allowing each agent to learn a local regression function while enforcing consensus constraints. We use a penalized variant of functional stochastic gradient descent operating simultaneously with low-dimensional subspace projections. These subspaces are constructed greedily by applying orthogonal matching pursuit to the sequence of kernel dictionaries and weights. By tuning the projection-induced bias, we propose an algorithm that allows for each individual agent to learn, based upon its locally observed data stream and message passing with its neighbors only, a regression function that is close to the globally optimal regression function. That is, we establish that with constant step-size selections agents' functions converge to a neighborhood of the globally optimal one while satisfying the consensus constraints as the penalty parameter is increased. Moreover, the complexity of the learned regression functions is guaranteed to remain finite. On both multi-class kernel logistic regression and multi-class kernel support vector classification with data generated from class-dependent Gaussian mixture models, we observe stable function estimation and state of the art performance for distributed online multi-class classification. Experiments on the Brodatz textures further substantiate the empirical validity of this approach.
- Explanation of the hot topic "multi-agent path finding".
- The paper reports on some results concerning Aqvist's dyadic logic known as system G, which is one of the most influential logics for reasoning with dyadic obligations ("it ought to be the case that ... if it is the case that ..."). Although this logic has been known in the literature for a while, many of its properties still await in-depth consideration. In this short paper we show: that any formula in system G including nested modal operators is equivalent to some formula with no nesting; that the universal modality introduced by Aqvist in the first presentation of the system is definable in terms of the deontic modality.
- Oct 11 2017 cs.MA arXiv:1710.03354v1Trajectory interpolation, the process of filling-in the gaps and removing noise from observed agent trajectories, is an essential task for the motion inference in multi-agent setting. A desired trajectory interpolation method should be robust to noise, changes in environments or agent densities, while also being yielding realistic group movement behaviors. Such realistic behaviors are, however, challenging to model as they require avoidance of agent-agent or agent-environment collisions and, at the same time, seek computational efficiency. In this paper, we propose a novel framework composed of data-driven priors (local, global or combined) and an efficient optimization strategy for multi-agent trajectory interpolation. The data-driven priors implicitly encode the dependencies of movements of multiple agents and the collision-avoiding desiderata, enabling elimination of costly pairwise collision constraints and resulting in reduced computational complexity and often improved estimation. Various combinations of priors and optimization algorithms are evaluated in comprehensive simulated experiments. Our experimental results reveal important insights, including the significance of the global flow prior and the lesser-than-expected influence of data-driven collision priors.
- Smart grid can be considered as a complex network where each node represents generation unit, producer, and links represent transmission lines. Agent-based modeling (ABM) is a paradigm represents a complex system of autonomous agents and their interactions with each other. Previously, numbers of study have been presented in smart grid domain using ABM methodologies. However, these study lack in following any ABM specification approach. The purely text based specification tends to be ambiguous and make the model difficult to be replicated. In this paper, we present an ABM for smart grid using a combination of agent-based and complex network based approaches. Then we present ODD and DREAM specification approach for our ABM in the smart grid. We compare these two specification techniques qualitatively as well as quantitatively and conclude that DREAM approach is better than ODD in term of understanding and replicating ABM.
- Oct 03 2017 cs.MA arXiv:1710.00709v1Within the area of multi-agent systems, normative systems are a widely used framework for the coordination of interdependent activities. A crucial problem associated with normative systems is that of synthesising norms that effectively accomplish a coordination task and whose compliance forms a rational choice for the agents within the system. In this work, we introduce a framework for the synthesis of normative systems that effectively coordinate a multi-agent system and whose norms are likely to be adopted by rational agents. Our approach roots in evolutionary game theory. Our framework considers multi-agent systems in which evolutionary forces lead successful norms to prosper and spread within the agent population, while unsuccessful norms are discarded. The outputs of this evolutionary norm synthesis process are normative systems whose compliance forms a rational choice for the agents. We empirically show the effectiveness of our approach through empirical evaluation in a simulated traffic domain.
- While reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. Using data collected from a pilot study of the spread of HIV awareness in social networks of homeless youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. Since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative Activation Jump Model (AJM) that describes information diffusion on physical networks from a multi-agent team perspective. Our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) The structural composition of a seed set team impacts each individual node's influencing behavior, and 2) an influencing node may spread information to non-neighbors. We show that the AJM significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. We then prove theoretical results, showing that the AJM exhibits many well-behaved properties shared by dominant models. Our results suggest that the AJM presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field.
- Developing a safe and efficient collision avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generate its paths without observing other robots' states and intents. While other distributed multi-robot collision avoidance systems exist, they often require extracting agent-level features to plan a local collision-free action, which can be computationally prohibitive and not robust. More importantly, in practice the performance of these methods are much lower than their centralized counterparts. We present a decentralized sensor-level collision avoidance policy for multi-robot systems, which directly maps raw sensor measurements to an agent's steering commands in terms of movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to find an optimal policy which is trained over a large number of robots on rich, complex environments simultaneously using a policy gradient based reinforcement learning algorithm. We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system. We also demonstrate that the learned policy can be well generalized to new scenarios that do not appear in the entire training period, including navigating a heterogeneous group of robots and a large-scale scenario with 100 robots. Videos are available at https://sites.google.com/view/drlmaca
- This paper focuses on two commonly used path assignment policies for agents traversing a congested network: self-interested routing, and system-optimum routing. In the self-interested routing policy each agent selects a path that optimizes its own utility, while the system-optimum routing agents are assigned paths with the goal of maximizing system performance. This paper considers a scenario where a centralized network manager wishes to optimize utilities over all agents, i.e., implement a system-optimum routing policy. In many real-life scenarios, however, the system manager is unable to influence the route assignment of all agents due to limited influence on route choice decisions. Motivated by such scenarios, a computationally tractable method is presented that computes the minimal amount of agents that the system manager needs to influence (compliant agents) in order to achieve system optimal performance. Moreover, this methodology can also determine whether a given set of compliant agents is sufficient to achieve system optimum and compute the optimal route assignment for the compliant agents to do so. Experimental results are presented showing that in several large-scale, realistic traffic networks optimal flow can be achieved with as low as 13% of the agent being compliant and up to 54%.
- Monte Carlo Tree Search (MCTS) has been extended to many imperfect information games. However, due to the added complexity that uncertainty introduces, these adaptations have not reached the same level of practical success as their perfect information counterparts. In this paper we consider the development of agents that perform well against humans in imperfect information games with partially observable actions. We introduce the Semi-Determinized-MCTS (SDMCTS), a variant of the Information Set MCTS algorithm (ISMCTS). More specifically, SDMCTS generates a predictive model of the unobservable portion of the opponent's actions from historical behavioral data. Next, SDMCTS performs simulations on an instance of the game where the unobservable portion of the opponent's actions are determined. Thereby, it facilitates the use of the predictive model in order to decrease uncertainty. We present an implementation of the SDMCTS applied to the Cheat Game, a well-known card game, with partially observable (and often deceptive) actions. Results from experiments with 120 subjects playing a head-to-head Cheat Game against our SDMCTS agents suggest that SDMCTS performs well against humans, and its performance improves as the predictive model's accuracy increases.
- This paper provides an overview of evolutionary robotics techniques applied to on-line distributed evolution for robot collectives -- namely, embodied evolution. It provides a definition of embodied evolution as well as a thorough description of the underlying concepts and mechanisms. The paper also presents a comprehensive summary of research published in the field since its inception (1999-2017), providing various perspectives to identify the major trends. In particular, we identify a shift from considering embodied evolution as a parallel search method within small robot collectives (fewer than 10 robots) to embodied evolution as an on-line distributed learning method for designing collective behaviours in swarm-like collectives. The paper concludes with a discussion of applications and open questions, providing a milestone for past and an inspiration for future research.
- This work studies the labeled multi-robot path and motion planning problem in continuous domains, in the absence of static obstacles. Given $n$ robots which may be arbitrarily close to each other and assuming random start and goal configurations for the robots, we derived an $O(n^3)$, complete algorithm that produces solutions with constant-factor optimality guarantees on both makespan and distance optimality, in expectation. Furthermore, our algorithm only requires a small constant factor expansion of the initial and goal configuration footprints for solving the problem. In addition to strong theoretical guarantees, we present a thorough computational evaluation of the proposed solution. Beyond the baseline solution, adapting an effective (but non-polynomial time) robot routing subroutine, we also provide a highly efficient implementation that quickly computes near-optimal solutions. Hardware experiments on microMVP platform composed of non-holonomic robots confirms the practical applicability of our algorithmic pipeline.
- Existing socio-psychological studies suggest that users of a social network form their opinions relying on the opinions of their neighbors. According to DeGroot opinion formation model, one value of particular importance is the asymptotic consensus value---the sum of user opinions weighted by the users' eigenvector centralities. This value plays the role of an attractor for the opinions in the network and is a lucrative target for external influence. However, since any potentially malicious control of the opinion distribution in a social network is clearly undesirable, it is important to design methods to prevent the external attempts to strategically change the asymptotic consensus value. In this work, we assume that the adversary wants to maximize the asymptotic consensus value by altering the opinions of some users in a network; we, then, state DIVER---an NP-hard problem of disabling such external influence attempts by strategically adding a limited number of edges to the network. Relying on the theory of Markov chains, we provide perturbation analysis that shows how eigenvector centrality and, hence, DIVER's objective function change in response to an edge's addition to the network. The latter leads to the design of a pseudo-linear-time heuristic for DIVER, whose computation relies on efficient estimation of mean first passage times in a Markov chain. We confirm our theoretical findings in experiments.
- Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research.
- Sep 25 2017 cs.MA arXiv:1709.07658v1The introduction of autonomous vehicles (AVs) will have far-reaching effects on road traffic in cities and on highways.The implementation of automated highway system (AHS), possibly with a dedicated lane only for AVs, is believed to be a requirement to maximise the benefit from the advantages of AVs. We study the ramifications of an increasing percentage of AVs on the traffic system with and without the introduction of a dedicated AV lane on highways. We conduct an analytical evaluation of a simplified scenario and a macroscopic simulation of the city of Singapore under user equilibrium conditions with a realistic traffic demand. We present findings regarding average travel time, fuel consumption, throughput and road usage. Instead of only considering the highways, we also focus on the effects on the remaining road network. Our results show a reduction of average travel time and fuel consumption as a result of increasing the portion of AVs in the system. We show that the introduction of an AV lane is not beneficial in terms of average commute time. Examining the effects of the AV population only, however, the AV lane provides a considerable reduction of travel time (approx. 25%) at the price of delaying conventional vehicles (approx. 7%). Furthermore a notable shift of travel demand away from the highways towards major and small roads is noticed in early stages of AV penetration of the system. Finally, our findings show that after a certain threshold percentage of AVs the differences between AV and no AV lane scenarios become negligible.
- Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. Although there have been recent advances of deep RL algorithms applied to multi-agent systems, learning communication protocols while simultaneously learning the behavior of the agents is still beyond the reach of deep RL algorithms. However, while it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building, building a communication link, and pushing an intruder. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.
- The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies Generalized Advantage Estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios where the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely-separated events occur, adversely affecting the policies learned. Additionally, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.
- Inspired by biological swarms, robotic swarms are envisioned to solve real-world problems that are difficult for individual agents. Biological swarms can achieve collective intelligence based on local interactions and simple rules; however, designing effective distributed policies for large-scale robotic swarms to achieve a global objective can be challenging. Although it is often possible to design an optimal centralized strategy for smaller numbers of agents, those methods can fail as the number of agents increases. Motivated by the growing success of machine learning, we develop a deep learning approach that learns distributed coordination policies from centralized policies. In contrast to traditional distributed control approaches, which are usually based on human-designed policies for relatively simple tasks, this learning-based approach can be adapted to more difficult tasks. We demonstrate the efficacy of our proposed approach on two different tasks, the well-known rendezvous problem and a more difficult particle assignment problem. For the latter, no known distributed policy exists. From extensive simulations, it is shown that the performance of the learned coordination policies is comparable to the centralized policies, surpassing state-of-the-art distributed policies. Thereby, our proposed approach provides a promising alternative for real-world coordination problems that would be otherwise computationally expensive to solve or intangible to explore.
- In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.
- Cooperative motion planning is still a challenging task for robots. Recently, Value Iteration Networks (VINs) were proposed to model motion planning tasks as Neural Networks. In this work, we extend VINs to solve cooperative planning tasks under non-holonomic constraints. For this, we interconnect multiple VINs to pay respect to each other's outputs. Policies for cooperation are generated via iterative gradient descend. Validation in simulation shows that the resulting networks can resolve non-holonomic motion planning problems that require cooperation.
- Sep 26 2017 cs.MA arXiv:1709.08235v1