Robots operate in environments with varying implicit structure. For instance, a helicopter flying over terrain encounters a very different arrangement of obstacles than a robotic arm manipulating objects on a cluttered table top. State-of-the-art motion planning systems do not exploit this structure, thereby expending valuable planning effort searching for implausible solutions. We are interested in planning algorithms that actively infer the underlying structure of the valid configuration space during planning in order to find solutions with minimal effort. Consider the problem of evaluating edges on a graph to quickly discover collision-free paths. Evaluating edges is expensive, both for robots with complex geometries like robot arms, and for robots with limited onboard computation like UAVs. Until now, this challenge has been addressed via laziness i.e. deferring edge evaluation until absolutely necessary, with the hope that edges turn out to be valid. However, all edges are not alike in value - some have a lot of potentially good paths flowing through them, and some others encode the likelihood of neighbouring edges being valid. This leads to our key insight - instead of passive laziness, we can actively choose edges that reduce the uncertainty about the validity of paths. We show that this is equivalent to the Bayesian active learning paradigm of decision region determination (DRD). However, the DRD problem is not only combinatorially hard, but also requires explicit enumeration of all possible worlds. We propose a novel framework that combines two DRD algorithms, DIRECT and BISECT, to overcome both issues. We show that our approach outperforms several state-of-the-art algorithms on a spectrum of planning problems for mobile robots, manipulators and autonomous helicopters.
This paper describes autonomous racing of RC race cars based on mathematical optimization. Using a dynamical model of the vehicle, control inputs are computed by receding horizon based controllers, where the objective is to maximize progress on the track subject to the requirement of staying on the track and avoiding opponents. Two different control formulations are presented. The first controller employs a two-level structure, consisting of a path planner and a nonlinear model predictive controller (NMPC) for tracking. The second controller combines both tasks in one nonlinear optimization problem (NLP) following the ideas of contouring control. Linear time varying models obtained by linearization are used to build local approximations of the control NLPs in the form of convex quadratic programs (QPs) at each sampling time. The resulting QPs have a typical MPC structure and can be solved in the range of milliseconds by recent structure exploiting solvers, which is key to the real-time feasibility of the overall control scheme. Obstacle avoidance is incorporated by means of a high-level corridor planner based on dynamic programming, which generates convex constraints for the controllers according to the current position of opponents and the track layout. The control performance is investigated experimentally using 1:43 scale RC race cars, driven at speeds of more than 3 m/s and in operating regions with saturated rear tire forces (drifting). The algorithms run at 50 Hz sampling rate on embedded computing platforms, demonstrating the real-time feasibility and high performance of optimization-based approaches for autonomous racing.
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.
Nov 21 2017 cs.RO
Nowadays, the construction of a complex robotic system requires a high level of specialization in a large number of diverse scientific areas. It is reasonable that a single researcher cannot create from scratch the entirety of this system, as it is impossible for him to have the necessary skills in the necessary fields. This obstacle is being surpassed with the existent robotic frameworks. This paper tries to give an extensive review of the most famous robotic frameworks and middleware, as well as to provide the means to effortlessly compare them. Additionally, we try to investigate the differences between the definitions of a robotic framework, a robotic middleware and a robotic architecture.
This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control. Our approach enables a robot to learn and adapt its gaze control strategy for human-robot interaction without the use of external sensors or human supervision. The robot learns to focus its attention on groups of people from its own audio-visual experiences, and independently of the number of people in the environment, their position and physical appearance. In particular, we use recurrent neural networks and Q-learning to find an optimal action-selection policy, and we pretrain on a synthetic environment that simulates sound sources and moving participants to avoid the need of interacting with people for hours. Our experimental evaluation suggests that the proposed method is robust in terms of parameters configuration (i.e. the selection of the parameter values has not a decisive impact on the performance). The best results are obtained when audio and video information are jointly used, and when a late fusion strategy is employed (i.e. when both sources of information are separately processed and then fused). Successful experiments on a real environment with the Nao robot indicate that our framework is a step forward towards the autonomous learning of a perceivable and socially acceptable gaze behavior.
Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.
Nov 21 2017 cs.RO
This paper studies the underlying combinatorial structure of a class of object rearrangement problems, which appear frequently in applications. The problems involve multiple, similar-geometry objects placed on a flat, horizontal surface, where a robot can approach them from above and perform pick-and-place operations to rearrange them. The paper considers both the case where the start and goal object poses overlap, and where they do not. For overlapping poses, the primary objective is to minimize the number of pick-and-place actions and then to minimize the distance traveled by the end-effector. For the non-overlapping case, the objective is solely to minimize the travel distance of the end-effector. While such problems do not involve all the complexities of general rearrangement, they remain computationally hard in both cases. This is shown through reductions from well-understood, hard combinatorial challenges to these rearrangement problems. The reductions are also shown to hold in the reverse direction, which enables the convenient application on rearrangement of well studied algorithms. These algorithms can be very efficient in practice despite the hardness results. The paper builds on these reduction results to propose an algorithmic pipeline for dealing with the rearrangement problems. Experimental evaluation, including hardware-based trials, shows that the proposed pipeline computes high-quality paths with regards to the optimization objectives. Furthermore, it exhibits highly desirable scalability as the number of objects increases in both the overlapping and non-overlapping setup.
Organisms result from multiple adaptive processes occurring and interacting at different time scales. One such interaction is that between development and evolution. In modeling studies, it has been shown that development sweeps over a series of traits in a single agent, and sometimes exposes promising static traits. Subsequent evolution can then canalize these rare traits. Thus, development can, under the right conditions, increase evolvability. Here, we report on a previously unknown phenomenon when embodied agents are allowed to develop and evolve: Evolution discovers body plans which are robust to control changes, these body plans become genetically assimilated, yet controllers for these agents are not assimilated. This allows evolution to continue climbing fitness gradients by tinkering with the developmental programs for controllers within these permissive body plans. This exposes a previously unknown detail about the Baldwin effect: instead of all useful traits becoming genetically assimilated, only phenotypic traits that render the agent robust to changes in other traits become assimilated. We refer to this phenomenon as differential canalization. This finding also has important implications for the evolutionary design of artificial and embodied agents such as robots: robots that are robust to internal changes in their controllers may also be robust to external changes in their environment, such as transferal from simulation to reality, or deployment in novel environments.
Nov 21 2017 cs.RO
In this work, we develop an approach for guiding robots to automatically localize and find the shapes of tumors and other stiff inclusions present in the anatomy. Our approach uses Gaussian processes to model the stiffness distribution and active learning to direct the palpation path of the robot. The palpation paths are chosen such that they maximize an acquisition function provided by an active learning algorithm. Our approach provides the flexibility to avoid obstacles in the robot's path, incorporate uncertainties in robot position and sensor measurements, include prior information about location of stiff inclusions while respecting the robot-kinematics. To the best of our knowledge this is the first work in literature that considers all the above conditions while localizing tumors. The proposed framework is evaluated via simulation and experimentation on three different robot platforms: 6-DoF industrial arm, da Vinci Research Kit (dVRK), and the Insertable Robotic Effector Platform (IREP). Results show that our approach can accurately estimate the locations and boundaries of the stiff inclusions while reducing exploration time.
Nov 21 2017 cs.RO
This paper aims to present a stability control strategy under lateral impact for quadruped robot with laterally movable joints. We firstly put up with five necessary conditions for keeping balance. We extend the classical four-neuron Central Pattern Generator (CPG) network based on Hopf oscillators to eight-neuron network with four more trigger-enabled neurons at four laterally movable joints, which can be triggered by the signal transmitted by the lateral acceleration sensor. Such network can coordinate the lateral and longitudinal trotting gait. Based on Zero Movement Point (ZMP) theory, the robot is modeled as an inverted pendulum to plan the Center of Gravity (CoG) position and calculate the needed step length. With the help of lateral trotting, the lateral acceleration of the quadruped robot after lateral impact can regain to the normal range in a short time. Simulation shows that the robot can resist lateral acceleration of up to 1.5g.