Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varied camera optics. We propose a visual ego-motion learning architecture that maps observed optical flow vectors to an ego-motion density estimate via a Mixture Density Network (MDN). By modeling the architecture as a Conditional Variational Autoencoder (C-VAE), our model is able to provide introspective reasoning and prediction for ego-motion induced scene-flow. Additionally, our proposed model is especially amenable to bootstrapped ego-motion learning in robots where the supervision in ego-motion estimation for a particular camera sensor can be obtained from standard navigation-based sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through experiments, we show the utility of our proposed approach in enabling the concept of self-supervised learning for visual ego-motion estimation in autonomous robots.
Mapping and self-localization in unknown environments are fundamental capabilities in many robotic applications. These tasks typically involve the identification of objects as unique features or landmarks, which requires the objects both to be detected and then assigned a unique identifier that can be maintained when viewed from different perspectives and in different images. The \textitdata association and \textitsimultaneous localization and mapping (SLAM) problems are, individually, well-studied in the literature. But these two problems are inherently tightly coupled, and that has not been well-addressed. Without accurate SLAM, possible data associations are combinatorial and become intractable easily. Without accurate data association, the error of SLAM algorithms diverge easily. This paper proposes a novel nonparametric pose graph that models data association and SLAM in a single framework. An algorithm is further introduced to alternate between inferring data association and performing SLAM. Experimental results show that our approach has the new capability of associating object detections and localizing objects at the same time, leading to significantly better performance on both the data association and SLAM problems than achieved by considering only one and ignoring imperfections in the other.
Feb 01 2017 cs.RO
It is essential for a robot to be able to detect revisits or loop closures for long-term visual navigation.A key insight explored in this work is that the loop-closing event inherently occurs sparsely, that is, the image currently being taken matches with only a small subset (if any) of previous images. Based on this observation, we formulate the problem of loop-closure detection as a sparse, convex $\ell_1$-minimization problem. By leveraging fast convex optimization techniques, we are able to efficiently find loop closures, thus enabling real-time robot navigation. This novel formulation requires no offline dictionary learning, as required by most existing approaches, and thus allows online incremental operation. Our approach ensures a unique hypothesis by choosing only a single globally optimal match when making a loop-closure decision. Furthermore, the proposed formulation enjoys a flexible representation with no restriction imposed on how images should be represented, while requiring only that the representations are "close" to each other when the corresponding images are visually similar. The proposed algorithm is validated extensively using real-world datasets.
Dec 23 2016 cs.RO
Many important geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given a set of relative measurements between them. This problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of the special Euclidean synchronization problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides an exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and design a truncated-Newton trust-region method to solve this reduction efficiently. Finally, we combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is able to recover certifiably globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics and computer vision applications, and does so more than an order of magnitude faster than the Gauss-Newton-based approach that forms the basis of current state-of-the-art techniques.
Many geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given noisy measurements of a subset of their pairwise relative transforms. This problem is typically formulated as a maximum-likelihood estimation that requires solving a nonconvex nonlinear program, which is computationally intractable in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of this estimation problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides the exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this semidefinite relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and then design a Riemannian truncated-Newton trust-region method to solve this reduction efficiently. We combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is capable of recovering globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics applications, and does so at a computational cost that scales comparably with that of direct Newton-type local search techniques.
We study the computational complexity of the Buttons \& Scissors game and obtain sharp thresholds with respect to several parameters. Specifically we show that the game is NP-complete for $C = 2$ colors but polytime solvable for $C = 1$. Similarly the game is NP-complete if every color is used by at most $F = 4$ buttons but polytime solvable for $F \leq 3$. We also consider restrictions on the board size, cut directions, and cut sizes. Finally, we introduce several natural two-player versions of the game and show that they are PSPACE-complete.
Jun 21 2016 cs.RO
Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved?
We present a novel RGB-D mapping system for generating 3D maps over spatially extended regions with higher resolution than current methods using multiple, dynamically placed mapping volumes. Our method takes in RGB-D frames and dynamically assigns multiple mapping volumes to the environment, exchanging mapping volumes between the CPU and GPU. Mapping volumes are added or removed as needed to allow for spatially extended, high resolution mapping. Our system is designed to maximize the resolution possible for such volumetric methods, while working on an unbounded space.
Traditional stereo algorithms have focused their efforts on reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to observe its immediate environment and perform tasks within it. In this work, we propose a high-performance and tunable stereo disparity estimation method, with a peak frame-rate of 120Hz (VGA resolution, on a single CPU-thread), that can potentially enable robots to quickly reconstruct their immediate surroundings and maneuver at high-speeds. Our key contribution is a disparity estimation algorithm that iteratively approximates the scene depth via a piece-wise planar mesh from stereo imagery, with a fast depth validation step for semi-dense reconstruction. The mesh is initially seeded with sparsely matched keypoints, and is recursively tessellated and refined as needed (via a resampling stage), to provide the desired stereo disparity accuracy. The inherent simplicity and speed of our approach, with the ability to tune it to a desired reconstruction quality and runtime performance makes it a compelling solution for applications in high-speed vehicles.
Sep 29 2015 cs.RO
Active SLAM is the task of actively planning robot paths while simultaneously building a map and localizing within. Existing work has focused on planning paths with occupancy grid maps, which do not scale well and suffer from long term drift. This work proposes a Topological Feature Graph (TFG) representation that scales well and develops an active SLAM algorithm with it. The TFG uses graphical models, which utilize independences between variables, and enables a unified quantification of exploration and exploitation gains with a single entropy metric. Hence, it facilitates a natural and principled balance between map exploration and refinement. A probabilistic roadmap path-planner is used to generate robot paths in real time. Experimental results demonstrate that the proposed approach achieves better accuracy than a standard grid-map based approach while requiring orders of magnitude less computation and memory resources.
In this work, we develop a monocular SLAM-aware object recognition system that is able to achieve considerably stronger recognition performance, as compared to classical object recognition systems that function on a frame-by-frame basis. By incorporating several key ideas including multi-view object proposals and efficient feature encoding methods, our proposed system is able to detect and robustly recognize objects in its environment using a single RGB camera in near-constant time. Through experiments, we illustrate the utility of using such a system to effectively detect and recognize objects, incorporating multiple object viewpoint detections into a unified prediction hypothesis. The performance of the proposed recognition system is evaluated on the UW RGB-D Dataset, showing strong recognition performance and scalable run-time performance compared to current state-of-the-art recognition systems.
State-of-the-art techniques for simultaneous localization and mapping (SLAM) employ iterative nonlinear optimization methods to compute an estimate for robot poses. While these techniques often work well in practice, they do not provide guarantees on the quality of the estimate. This paper shows that Lagrangian duality is a powerful tool to assess the quality of a given candidate solution. Our contribution is threefold. First, we discuss a revised formulation of the SLAM inference problem. We show that this formulation is probabilistically grounded and has the advantage of leading to an optimization problem with quadratic objective. The second contribution is the derivation of the corresponding Lagrangian dual problem. The SLAM dual problem is a (convex) semidefinite program, which can be solved reliably and globally by off-the-shelf solvers. The third contribution is to discuss the relation between the original SLAM problem and its dual. We show that from the dual problem, one can evaluate the quality (i.e., the suboptimality gap) of a candidate SLAM solution, and ultimately provide a certificate of optimality. Moreover, when the duality gap is zero, one can compute a guaranteed optimal SLAM solution from the dual problem, circumventing non-convex optimization. We present extensive (real and simulated) experiments supporting our claims and discuss practical relevance and open problems.
Jan 11 2011 cs.DC
Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates for massively parallel computer network topologies. In order to realize a practical computer architecture based on these graphs, it is useful to have a means of constructing a large-scale system from smaller, simpler modules. In this paper we consider the mathematical problem of uniformly tiling a de Bruijn or Kautz graph. This can be viewed as a generalization of the graph bisection problem. We focus on the problem of graph tilings by a set of identical subgraphs. Tiles should contain a maximal number of internal edges so as to minimize the number of edges connecting distinct tiles. We find necessary and sufficient conditions for the construction of tilings. We derive a simple lower bound on the number of edges which must leave each tile, and construct a class of tilings whose number of edges leaving each tile agrees asymptotically in form with the lower bound to within a constant factor. These tilings make possible the construction of large-scale computing systems based on de Bruijn and Kautz graph topologies.