May 23 2018 cs.RO
In this paper we introduce a novel framework for expressing and learning force-sensitive robot manipulation skills. It is based on a formalism that extends our previous work on adaptive impedance control with meta parameter learning and compatible skill specifications. This way the system is also able to make use of abstract expert knowledge by incorporating process descriptions and quality evaluation metrics. We evaluate various state-of-the-art schemes for the meta parameter learning and experimentally compare selected ones. Our results clearly indicate that the combination of our adaptive impedance controller with a carefully defined skill formalism significantly reduces the complexity of manipulation tasks even for learning peg-in-hole with submillimeter industrial tolerances. Overall, the considered system is able to learn variations of this skill in under 20 minutes. In fact, experimentally the system was able to perform the learned tasks faster than humans, leading to the first learning-based solution of complex assembly at such real-world performance.
A robust Model Predictive Control (MPC) approach for controlling front steering of an autonomous vehicle is presented in this paper. We present various approaches to increase the robustness of model predictive control by using weight tuning, a successive on-line linearization of a nonlinear vehicle model to track position error and successive on-line linearization to track velocity error. Results of the effectiveness of each method in terms of accuracy and computational load are discussed.
May 23 2018 cs.RO
This paper shows experimental results on learning based randomized bin-picking combined with iterative visual recognition. We use the random forest to predict whether or not a robot will successfully pick an object for given depth images of the pile taking the collision between a finger and a neighboring object into account. For the discriminator to be accurate, we consider estimating objects' poses by merging multiple depth images of the pile captured from different points of view by using a depth sensor attached at the wrist. We show that, even if a robot is predicted to fail in picking an object with a single depth image due to its large occluded area, it is finally predicted as success after merging multiple depth images. In addition, we show that the random forest can be trained with the small number of training data.
Recently there has been a rising interest in training agents, embodied in virtual environments, to perform language-directed tasks by deep reinforcement learning. In this paper, we propose a simple but effective neural language grounding module for embodied agents that can be trained end to end from scratch taking raw pixels, unstructured linguistic commands, and sparse rewards as the inputs. We model the language grounding process as a language-guided transformation of visual features, where latent sentence embeddings are used as the transformation matrices. In several language-directed navigation tasks that feature challenging partial observation and require simple reasoning, our module significantly outperforms the state of the arts. We also release XWORLD 3D, an easy-to-customize 3D environment that can potentially be modified to evaluate a variety of embodied agents.
Handling object interaction is a fundamental challenge in practical multi-object tracking, even for simple interactive effects such as one object temporarily occluding another. We formalize the problem of occlusion in tracking with two different abstractions. In object-wise occlusion, objects that are occluded by other objects do not generate measurements. In measurement-wise occlusion, a previously unstudied approach, all objects may generate measurements but some measurements may be occluded by others. While the relative validity of each abstraction depends on the situation and sensor, measurement-wise occlusion fits into probabilistic multi-object tracking algorithms with much looser assumptions on object interaction. Its value is demonstrated by showing that it naturally derives a popular approximation for lidar tracking, and by an example of visual tracking in image space.
The Swarmathon is a swarm robotics programming challenge that engages college students from minority-serving institutions in NASA's Journey to Mars. Teams compete by programming a group of robots to search for, pick up, and drop off resources in a collection zone. The Swarmathon produces prototypes for robot swarms that would collect resources on the surface of Mars. Robots operate completely autonomously with no global map, and each team's algorithm must be sufficiently flexible to effectively find resources from a variety of unknown distributions. The Swarmathon includes Physical and Virtual Competitions. Physical competitors test their algorithms on robots they build at their schools; they then upload their code to run autonomously on identical robots during the three day competition in an outdoor arena at Kennedy Space Center. Virtual competitors complete an identical challenge in simulation. Participants mentor local teams to compete in a separate High School Division. In the first 2 years, over 1,100 students participated. 63% of students were from underrepresented ethnic and racial groups. Participants had significant gains in both interest and core robotic competencies that were equivalent across gender and racial groups, suggesting that the Swarmathon is effectively educating a diverse population of future roboticists.
In many real-world robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. Importantly, we should expect the human to have preferences about what information they are given and when they are given it. In this work, we adopt an information-theoretic view of the human's preferences: the human scores a piece of information as a function of the induced reduction in weighted entropy of their belief about the environment state. We formulate this setting as a POMDP and give a practical algorithm for solving it approximately. Then, we give an algorithm that allows the agent to sample-efficiently learn the human's preferences online. Finally, we describe an extension in which the human's preferences are time-varying. We validate our approach experimentally in two planning domains: a 2D robot mining task and a more realistic 3D robot fetching task.
In this paper, we study the multi-robot task allocation problem where a group of robots needs to be allocated to a set of tasks so that the tasks can be finished optimally. One task may need more than one robot to finish it. Therefore the robots need to form coalitions to complete these tasks. Multi-robot coalition formation for task allocation is a well-known NP-hard problem. To solve this problem, we use a linear-programming based graph partitioning approach along with a region growing strategy which allocates (near) optimal robot coalitions to tasks in a negligible amount of time. Our proposed algorithm is fast (only taking 230 secs. for 100 robots and 10 tasks) and it also finds a near-optimal solution (up to 97.66% of the optimal). We have empirically demonstrated that the proposed approach in this paper always finds a solution which is closer (up to 9.1 times) to the optimal solution than a theoretical worst-case bound proved in an earlier work.
3D registration has always been performed invoking singular value decomposition (SVD) or eigenvalue decomposition (EIG) in real engineering practices. However, numerical algorithms suffer from uncertainty of convergence in many cases. A novel fast symbolic solution is proposed in this paper by following our recent publication in this journal. The equivalence analysis shows that our previous solver can be converted to deal with the 3D registration problem. Rather, the computation procedure is studied for further simplification of computing without complex numbers support. Experimental results show that the proposed solver does not loose accuracy and robustness but improves the execution speed to a large extent by almost \%50 to \%80, on both personal computer and embedded processor.
A detailed environment perception is a crucial component of automated vehicles. However, to deal with the amount of perceived information, we also require segmentation strategies. Based on a grid map environment representation, well-suited for sensor fusion, free-space estimation and machine learning, we detect and classify objects using deep convolutional neural networks. As input for our networks we use a multi-layer grid map efficiently encoding 3D range sensor information. The inference output consists of a list of rotated bounding boxes with associated semantic classes. We conduct extensive ablation studies, highlight important design considerations when using grid maps and evaluate our models on the KITTI Bird's Eye View benchmark. Qualitative and quantitative benchmark results show that we achieve robust detection and state of the art accuracy solely using top-view grid maps from range sensor data.
May 23 2018 cs.RO
In recent years, the drive of the Industry 4.0 initiative has enriched industrial and scientific approaches to build self-driving cars or smart factories. Agricultural applications benefit from both advances, as they are in reality mobile driving factories which process the environment. Therefore, acurate perception of the surrounding is a crucial task as it involves the goods to be processed, in contrast to standard indoor production lines. Environmental processing requires accurate and robust quantification in order to correctly adjust processing parameters and detect hazardous risks during the processing. While today approaches still implement functional elements based on a single particular set of sensors, it may become apparent that a unified representation of the environment compiled from all available information sources would be more versatile, sufficient, and cost effective. The key to this approach is the means of developing a common information language from the data provided. In this paper, we introduce and discuss techniques to build so called inverse sensor models that create a common information language among different, but typically agricultural, information providers. These can be current live sensor data, farm management systems, or long term information generated from previous processing, drones, or satellites. In the context of Industry 4.0, this enables the interoperability of different agricultural systems and allows information transparency.
In the past few years, the technology of automated guided vehicles (AGVs) has notably advanced. In particular, in the field of factory and warehouse automation, different approaches have been presented for detecting and localizing pallets inside warehouses and shop-floor environments based on the data acquired from 2D laser rangefinders. In , we present a robust approach allowing AGVs to detect, localize, and track multiple pallets using machine learning techniques based on an on-board 2D laser rangefinder. In this paper, the data used in [1, 2] for solving the problem of detection, localization and tracking of pallets is described. Furthermore, we present an open repository of dataset and code to the community for further research activities. The dataset comprises a collection of 565 2D scans from real-world environments, which are divided into 340 samples where pallets are present, whereas 225 samples represent the case in which no pallets are present.