In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.
Decentralized control of robots has attracted huge research interests. However, some of the research used unrealistic assumptions without collision avoidance. This report focuses on the collision-free control for multiple robots in both complete coverage and search tasks in 2D and 3D areas which are arbitrary unknown. All algorithms are decentralized as robots have limited abilities and they are mathematically proved. The report starts with the grid selection in the two tasks. Grid patterns simplify the representation of the area and robots only need to move straightly between neighbor vertices. For the 100% complete 2D coverage, the equilateral triangular grid is proposed. For the complete coverage ignoring the boundary effect, the grid with the fewest vertices is calculated in every situation for both 2D and 3D areas. The second part is for the complete coverage in 2D and 3D areas. A decentralized collision-free algorithm with the above selected grid is presented driving robots to sections which are furthest from the reference point. The area can be static or expanding, and the algorithm is simulated in MATLAB. Thirdly, three grid-based decentralized random algorithms with collision avoidance are provided to search targets in 2D or 3D areas. The number of targets can be known or unknown. In the first algorithm, robots choose vacant neighbors randomly with priorities on unvisited ones while the second one adds the repulsive force to disperse robots if they are close. In the third algorithm, if surrounded by visited vertices, the robot will use the breadth-first search algorithm to go to one of the nearest unvisited vertices via the grid. The second search algorithm is verified on Pioneer 3-DX robots. The general way to generate the formula to estimate the search time is demonstrated. Algorithms are compared with five other algorithms in MATLAB to show their effectiveness.
In this paper, the problem of energy trading between smart grid prosumers, who can simultaneously consume and produce energy, and a grid power company is studied. The problem is formulated as a single-leader, multiple-follower Stackelberg game between the power company and multiple prosumers. In this game, the power company acts as a leader who determines the pricing strategy that maximizes its profits, while the prosumers act as followers who react by choosing the amount of energy to buy or sell so as to optimize their current and future profits. The proposed game accounts for each prosumer's subjective decision when faced with the uncertainty of profits, induced by the random future price. In particular, the framing effect, from the framework of prospect theory (PT), is used to account for each prosumer's valuation of its gains and losses with respect to an individual utility reference point. The reference point changes between prosumers and stems from their past experience and future aspirations of profits. The followers' noncooperative game is shown to admit a unique pure-strategy Nash equilibrium (NE) under classical game theory (CGT) which is obtained using a fully distributed algorithm. The results are extended to account for the case of PT using algorithmic solutions that can achieve an NE under certain conditions. Simulation results show that the total grid load varies significantly with the prosumers' reference point and their loss-aversion level. In addition, it is shown that the power company's profits considerably decrease when it fails to account for the prosumers' subjective perceptions under PT.
We study the problem of distributed maximum computation in an open multi-agent system, where agents can leave and arrive during the execution of the algorithm. The main challenge comes from the possibility that the agent holding the largest value leaves the system, which changes the value to be computed. The algorithms must as a result be endowed with mechanisms allowing to forget outdated information. The focus is on systems in which interactions are pairwise gossips between randomly selected agents. We consider situations where leaving agents can send a last message, and situations where they cannot. For both cases, we provide algorithms able to eventually compute the maximum of the values held by agents.