This paper describes autonomous racing of RC race cars based on mathematical optimization. Using a dynamical model of the vehicle, control inputs are computed by receding horizon based controllers, where the objective is to maximize progress on the track subject to the requirement of staying on the track and avoiding opponents. Two different control formulations are presented. The first controller employs a two-level structure, consisting of a path planner and a nonlinear model predictive controller (NMPC) for tracking. The second controller combines both tasks in one nonlinear optimization problem (NLP) following the ideas of contouring control. Linear time varying models obtained by linearization are used to build local approximations of the control NLPs in the form of convex quadratic programs (QPs) at each sampling time. The resulting QPs have a typical MPC structure and can be solved in the range of milliseconds by recent structure exploiting solvers, which is key to the real-time feasibility of the overall control scheme. Obstacle avoidance is incorporated by means of a high-level corridor planner based on dynamic programming, which generates convex constraints for the controllers according to the current position of opponents and the track layout. The control performance is investigated experimentally using 1:43 scale RC race cars, driven at speeds of more than 3 m/s and in operating regions with saturated rear tire forces (drifting). The algorithms run at 50 Hz sampling rate on embedded computing platforms, demonstrating the real-time feasibility and high performance of optimization-based approaches for autonomous racing.
Zero-shot learning (ZSL) aims to recognize objects from novel unseen classes without any training data. Recently, structure-transfer based methods are proposed to implement ZSL by transferring structural knowledge from the semantic embedding space to image feature space to classify testing images. However, we observe that such a knowledge transfer framework may suffer from the problem of the geometric inconsistency between the data in the training and testing spaces. We call this problem as the space shift problem. In this paper, we propose a novel graph based method to alleviate this space shift problem. Specifically, a Shared Reconstruction Graph (SRG) is pursued to capture the common structure of data in the two spaces. With the learned SRG, each unseen class prototype (cluster center) in the image feature space can be synthesized by the linear combination of other class prototypes, so that testing instances can be classified based on the distance to these synthesized prototypes. The SRG bridges the image feature space and semantic embedding space. By applying spectral clustering on the learned SRG, many meaningful clusters can be discovered, which interprets ZSL performance on the datasets. Our method can be easily extended to the generalized zero-shot learning setting. Experiments on three popular datasets show that our method outperforms other methods on all datasets. Even with a small number of training samples, our method can achieve the state-of-the-art performance.
Deep learning based image steganalysis has attracted increasing attentions in recent years. Several Convolutional Neural Network (CNN) models have been proposed and achieved state-of-the-art performances on detecting steganography. In this paper, we explore an important technique in deep learning, the batch normalization, for the task of image steganalysis. Different from natural image classification, steganalysis is to discriminate cover images and stego images which are the result of adding weak stego signals into covers. This characteristic makes a cover image is more statistically similar to its stego than other cover images, requiring steganalytic methods to use paired learning to extract effective features for image steganalysis. Our theoretical analysis shows that a CNN model with multiple normalization layers is hard to be generalized to new data in the test set when it is well trained with paired learning. To hand this difficulty, we propose a novel normalization technique called Shared Normalization (SN) in this paper. Unlike the batch normalization layer utilizing the mini-batch mean and standard deviation to normalize each input batch, SN shares same statistics for all training and test batches. Based on the proposed SN layer, we further propose a novel neural network model for image steganalysis. Extensive experiments demonstrate that the proposed network with SN layers is stable and can detect the state of the art steganography with better performances than previous methods.
We develop a Computer Aided Diagnosis (CAD) system, which enhances the performance of dentists in detecting wide range of dental caries. The CAD System achieves this by acting as a second opinion for the dentists with way higher sensitivity on the task of detecting cavities than the dentists themselves. We develop annotated dataset of more than 3000 bitewing radiographs and utilize it for developing a system for automated diagnosis of dental caries. Our system consists of a deep fully convolutional neural network (FCNN) consisting 100+ layers, which is trained to mark caries on bitewing radiographs. We have compared the performance of our proposed system with three certified dentists for marking dental caries. We exceed the average performance of the dentists in both recall (sensitivity) and F1-Score (agreement with truth) by a very large margin. Working example of our system is shown in Figure 1.
The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
We analyse how the standard reductions between constraint satisfaction problems affect their proof complexity. We show that, for the most studied propositional, algebraic, and semi-algebraic proof systems, the classical constructions of pp-interpretability, homomorphic equivalence and addition of constants to a core preserve the proof complexity of the CSP. As a result, for those proof systems, the classes of constraint languages for which small unsatisfiability certificates exist can be characterised algebraically. We illustrate our results by a gap theorem saying that a constraint language either has resolution refutations of constant width, or does not have bounded-depth Frege refutations of subexponential size. The former holds exactly for the widely studied class of constraint languages of bounded width. This class is also known to coincide with the class of languages with refutations of sublinear degree in Sums-of-Squares and Polynomial Calculus over the real-field, for which we provide alternative proofs. We then ask for the existence of a natural proof system with good behaviour with respect to reductions and simultaneously small size refutations beyond bounded width. We give an example of such a proof system by showing that bounded-degree Lovász-Schrijver satisfies both requirements. Finally, building on the known lower bounds, we demonstrate the applicability of the method of reducibilities and construct new explicit hard instances of the graph 3-coloring problem for all studied proof systems.
Robots operate in environments with varying implicit structure. For instance, a helicopter flying over terrain encounters a very different arrangement of obstacles than a robotic arm manipulating objects on a cluttered table top. State-of-the-art motion planning systems do not exploit this structure, thereby expending valuable planning effort searching for implausible solutions. We are interested in planning algorithms that actively infer the underlying structure of the valid configuration space during planning in order to find solutions with minimal effort. Consider the problem of evaluating edges on a graph to quickly discover collision-free paths. Evaluating edges is expensive, both for robots with complex geometries like robot arms, and for robots with limited onboard computation like UAVs. Until now, this challenge has been addressed via laziness i.e. deferring edge evaluation until absolutely necessary, with the hope that edges turn out to be valid. However, all edges are not alike in value - some have a lot of potentially good paths flowing through them, and some others encode the likelihood of neighbouring edges being valid. This leads to our key insight - instead of passive laziness, we can actively choose edges that reduce the uncertainty about the validity of paths. We show that this is equivalent to the Bayesian active learning paradigm of decision region determination (DRD). However, the DRD problem is not only combinatorially hard, but also requires explicit enumeration of all possible worlds. We propose a novel framework that combines two DRD algorithms, DIRECT and BISECT, to overcome both issues. We show that our approach outperforms several state-of-the-art algorithms on a spectrum of planning problems for mobile robots, manipulators and autonomous helicopters.
By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.
Neural networks have demonstrated considerable success in a wide variety of real-world problems. However, the presence of adversarial examples - slightly perturbed inputs that are misclassified with high confidence - limits our ability to guarantee performance for these networks in safety-critical applications. We demonstrate that, for networks that are piecewise affine (for example, deep networks with ReLU and maxpool units), proving no adversarial example exists - or finding the closest example if one does exist - can be naturally formulated as solving a mixed integer program. Solves for a fully-connected MNIST classifier with three hidden layers can be completed an order of magnitude faster than those of the best existing approach. To address the concern that adversarial examples are irrelevant because pixel-wise attacks are unlikely to happen in natural images, we search for adversaries over a natural class of perturbations written as convolutions with an adversarial blurring kernel. When searching over blurred images, we find that as opposed to pixelwise attacks, some misclassifications are impossible. Even more interestingly, a small fraction of input images are provably robust to blurs: every blurred version of the input is classified with the same, correct label.
We study a classification problem where each feature can be acquired for a cost and the goal is to optimize the trade-off between classification precision and the total feature cost. We frame the problem as a sequential decision-making problem, where we classify one sample in each episode. At each step, an agent can use values of acquired features to decide whether to purchase another one or whether to classify the sample. We use vanilla Double Deep Q-learning, a standard reinforcement learning technique, to find a classification policy. We show that this generic approach outperforms Adapt-Gbrt, currently the best-performing algorithm developed specifically for classification with costly features.