- We prove a characterization of $t$-query quantum algorithms in terms of the unit ball of a space of degree-$2t$ polynomials. Based on this, we obtain a refined notion of approximate polynomial degree that equals the quantum query complexity, answering a question of Aaronson et al. (CCC'16). Our proof is based on a fundamental result of Christensen and Sinclair (J. Funct. Anal., 1987) that generalizes the well-known Stinespring representation for quantum channels to multilinear forms. Using our characterization, we show that many polynomials of degree four are far from those coming from two-query quantum algorithms. We also give a simple and short proof of one of the results of Aaronson et al. showing an equivalence between one-query quantum algorithms and bounded quadratic polynomials.
- We study the problem of list-decodable Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. \bf List-Decodable Mean Estimation. Fix any $d \in \mathbb{Z}_+$ and $0< \alpha <1/2$. We design an algorithm with runtime $O (\mathrm{poly}(n/\alpha)^{d})$ that outputs a list of $O(1/\alpha)$ many candidate vectors such that with high probability one of the candidates is within $\ell_2$-distance $O(\alpha^{-1/(2d)})$ from the true mean. The only previous algorithm for this problem achieved error $\tilde O(\alpha^{-1/2})$ under second moment conditions. For $d = O(1/\epsilon)$, our algorithm runs in polynomial time and achieves error $O(\alpha^{\epsilon})$. We also give a Statistical Query lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible. \bf Learning Mixtures of Spherical Gaussians. We give a learning algorithm for mixtures of spherical Gaussians that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform mixture of $k$ identity covariance Gaussians we obtain: For any $\epsilon>0$, if the pairwise separation between the means is at least $\Omega(k^{\epsilon}+\sqrt{\log(1/\delta)})$, our algorithm learns the unknown parameters within accuracy $\delta$ with sample complexity and running time $\mathrm{poly} (n, 1/\delta, (k/\epsilon)^{1/\epsilon})$. The previously best known polynomial time algorithm required separation at least $k^{1/4} \mathrm{polylog}(k/\delta)$. Our main technical contribution is a new technique, using degree-$d$ multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.
- Coherence, Quantum Fisher Information, Superradiance and Entanglement are Interconvertible ResourcesNov 21 2017 quant-ph arXiv:1711.07185v1We demonstrate that quantum Fisher information and superradiance can be formulated as coherence measures in accordance with the resource theory of coherence, thus establishing a direct link between metrological information, superradiance and coherence. We also show that the trivial coherence measure has a metrological interpretation. The arguments are then generalized to show that coherence may be considered as the underlying fundamental resource for any functional of state that is first of all faithful, and second of all concave or linear. Arguments are then presented that show quantum Fisher information and the superradiant quantity are in fact antithetical resources in the sense that if coherence were directed to saturate one quantity, then it must come at the expense of the other. Finally, a key result of the paper is to demonstrate that coherence, quantum Fisher information, superradiant quantity, and entanglement are in fact mutually interconvertible resources under incoherent operations.
- Nov 21 2017 cs.CV arXiv:1711.07302v1Zero-shot learning (ZSL) aims to recognize objects from novel unseen classes without any training data. Recently, structure-transfer based methods are proposed to implement ZSL by transferring structural knowledge from the semantic embedding space to image feature space to classify testing images. However, we observe that such a knowledge transfer framework may suffer from the problem of the geometric inconsistency between the data in the training and testing spaces. We call this problem as the space shift problem. In this paper, we propose a novel graph based method to alleviate this space shift problem. Specifically, a Shared Reconstruction Graph (SRG) is pursued to capture the common structure of data in the two spaces. With the learned SRG, each unseen class prototype (cluster center) in the image feature space can be synthesized by the linear combination of other class prototypes, so that testing instances can be classified based on the distance to these synthesized prototypes. The SRG bridges the image feature space and semantic embedding space. By applying spectral clustering on the learned SRG, many meaningful clusters can be discovered, which interprets ZSL performance on the datasets. Our method can be easily extended to the generalized zero-shot learning setting. Experiments on three popular datasets show that our method outperforms other methods on all datasets. Even with a small number of training samples, our method can achieve the state-of-the-art performance.
- Nov 21 2017 quant-ph arXiv:1711.07216v1Presently, one of the most ambitious technological goals is the development of devices working under the laws of quantum mechanics. One prominent target is the quantum computer, which would allow the processing of information at quantum level for purposes not achievable with even the most powerful computer resources. The large-scale implementation of quantum information would be a game changer for current technology, because it would allow unprecedented parallelised computation and secure encryption based on the principles of quantum superposition and entanglement. Currently, there are several physical platforms racing to achieve the level of performance required for the quantum hardware to step into the realm of practical quantum information applications. Several materials have been proposed to fulfil this task, ranging from quantum dots, Bose-Einstein condensates, spin impurities, superconducting circuits, molecules, amongst others. Magnetic molecules are among the list of promising building blocks, due to (i) their intrinsic monodispersity, (ii) discrete energy levels (iii) the possibility of chemical quantum state engineering, and (iv) their multilevel characteristics, leading to the so called Qudits (d > 2), amongst others. Herein we review how a molecular multilevel nuclear spin qubit (or qudit, where d = 4), known as TbPc2, gathers all the necessary requirements to perform as a molecular hardware platform with a first generation of molecular devices enabling even quantum algorithm operations.
- A common challenge faced in many branches of quantum physics is finding the extremal eigenvalues and eigenvectors of a Hamiltonian matrix too large to store in computer memory. There are numerous efficient methods developed for this task, but they generally fail when some control parameter in the Hamiltonian matrix such as interaction coupling exceeds some threshold value. In this work we present a new technique called eigenvector continuation that can extend the reach of these methods. Borrowing some concepts from machine learning, the key insight is that while an eigenvector resides in a linear space that may have enormous dimensions, the effective dimensionality of the eigenvector trajectory traced out by the one-parameter family of Hamiltonian matrices is small. We prove this statement using analytic function theory and propose a algorithm to solve for the extremal eigenvectors. We benchmark the method using several examples from quantum many-body theory.
- The zero dynamics of infinite-dimensional systems can be difficult to characterize. The zero dynamics of boundary control systems are particularly problematic. In this paper the zero dynamics of port-Hamiltonian systems are studied. A complete characterization of the zero dynamics for a port-Hamiltonian systems with invertible feedthrough as another port-Hamiltonian system on the same state space is given. It is shown that the zero dynamics for any port-Hamiltonian system with commensurate wave speeds are well-defined, and are also a port-Hamiltonian system. Examples include wave equations with uniform wave speed on a network. A constructive procedure for calculation of the zero dynamics, that can be used for very large system order, is provided.
- Nov 21 2017 physics.chem-ph cond-mat.stat-mech arXiv:1711.07326v1Entropy concept was introduced by Clausius 160 years ago, and has been continually enriched, developed and interpreted by the researchers in many different scientific disciplines ever since. Thermodynamics and other scientific disciplines face several simple but crucial questions concerning the entropy concept. Thus, it is frequently possible to notice a misuse of entropy. Sometimes unbelievable confusion in the literature is summarized by von Neumann's sentence: "No one knows what entropy really is." Four major questions stand before thermodynamics. (1) How many kinds of entropy are there? (2) What is the physical meaning of entropy? (3) Is entropy a subjective or an objective property? (4) Is entropy in any way related to information? This paper attempts to describe the roots, the conceptual history of this important concept and to describe the path of its development and application in different scientific disciplines. Through this we attempt to give some possible answers to the questions mentioned above.
- The detection of gravitational waves with LIGO and Virgo requires a detailed understanding of the response of these instruments in the presence of environmental and instrumental noise. Of particular interest is the study of non-Gaussian noise transients known as glitches, since their high occurrence rate in LIGO/Virgo data can obscure or even mimic true gravitational wave signals. Therefore, successfully identifying and excising glitches is of utmost importance to detect and characterize gravitational waves. In this article, we present the first application of Deep Learning combined with Transfer Learning for glitch classification, using real data from LIGO's first discovery campaign labeled by Gravity Spy, showing that knowledge from pre-trained models for real-world object recognition can be transferred for classifying spectrograms of glitches. We demonstrate that this method enables the optimal use of very deep convolutional neural networks for glitch classification given small unbalanced training datasets, significantly reduces the training time, and achieves state-of-the-art accuracy above 98.8%. Once trained via transfer learning, we show that the networks can be truncated and used as feature extractors for unsupervised clustering to automatically group new classes of glitches. This feature is of critical importance to identify and remove new types of glitches which will occur as the LIGO/Virgo detectors gradually attain design sensitivity.
- Nov 21 2017 cs.LG arXiv:1711.07465v1We develop a new family of convex relaxations for $k$-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms. We then prove a sharp upper bound on the sum-of-squares norms for moment tensors of any distribution that satisfies the \emphPoincare inequality. The Poincare inequality is a central inequality in probability theory, and a large class of distributions satisfy it including Gaussians, product distributions, strongly log-concave distributions, and any sum or uniformly continuous transformation of such distributions. As an immediate corollary, for any $\gamma > 0$, we obtain an efficient algorithm for learning the means of a mixture of $k$ arbitrary \Poincare distributions in $\mathbb{R}^d$ in time $d^{O(1/\gamma)}$ so long as the means have separation $\Omega(k^{\gamma})$. This in particular yields an algorithm for learning Gaussian mixtures with separation $\Omega(k^{\gamma})$, thus partially resolving an open problem of Regev and Vijayaraghavan \citetregev2017learning. Our algorithm works even in the outlier-robust setting where an $\epsilon$ fraction of arbitrary outliers are added to the data, as long as the fraction of outliers is smaller than the smallest cluster. We, therefore, obtain results in the strong agnostic setting where, in addition to not knowing the distribution family, the data itself may be arbitrarily corrupted.
- Brain-Machine Interaction (BMI) system motivates interesting and promising results in forward/feedback control consistent with human intention. It holds great promise for advancements in patient care and applications to neurorehabilitation. Here, we propose a novel neurofeedback-based BCI robotic platform using a personalized social robot in order to assist patients having cognitive deficits through bilateral rehabilitation and mental training. For initial testing of the platform, electroencephalography (EEG) brainwaves of a human user were collected in real time during tasks of imaginary movements. First, the brainwaves associated with imagined body kinematics parameters were decoded to control a cursor on a computer screen in training protocol. Then, the experienced subject was able to interact with a social robot via our real-time BMI robotic platform. Corresponding to subject's imagery performance, he/she received specific gesture movements and eye color changes as neural-based feedback from the robot. This hands-free neurofeedback interaction not only can be used for mind control of a social robot's movements, but also sets the stage for application to enhancing and recovering mental abilities such as attention via training in humans by providing real-time neurofeedback from a social robot.
- Conditional variants of Generative Adversarial Networks (GANs), known as cGANs, are generative models that can produce data samples ($x$) conditioned on both latent variables ($z$) and known auxiliary information ($c$). Another GAN variant, Bidirectional GAN (BiGAN) is a recently developed framework for learning the inverse mapping from $x$ to $z$ through an encoder trained simultaneously with the generator and the discriminator of an unconditional GAN. We propose the Bidirectional Conditional GAN (BCGAN), which combines cGANs and BiGANs into a single framework with an encoder that learns inverse mappings from $x$ to both $z$ and $c$, trained simultaneously with the conditional generator and discriminator in an end-to-end setting. We present crucial techniques for training BCGANs, which incorporate an extrinsic factor loss along with an associated dynamically-tuned importance weight. As compared to other encoder-based GANs, BCGANs not only encode $c$ more accurately but also utilize $z$ and $c$ more effectively and in a more disentangled way to generate data samples.
- While deep neural networks have been shown in recent years to outperform other machine learning methods in a wide range of applications, one of the biggest challenges with enabling deep neural networks for widespread deployment on edge devices such as mobile and other consumer devices is high computational and memory requirements. Recently, there has been greater exploration into small deep neural network architectures that are more suitable for edge devices, with one of the most popular architectures being SqueezeNet, with an incredibly small model size of 4.8MB. Taking further advantage of the notion that many applications of machine learning on edge devices are often characterized by a low number of target classes, this study explores the utility of combining architectural modifications and an evolutionary synthesis strategy for synthesizing even smaller deep neural architectures based on the more recent SqueezeNet v1.1 macroarchitecture for applications with fewer target classes. In particular, architectural modifications are first made to SqueezeNet v1.1 to accommodate for a 10-class ImageNet-10 dataset, and then an evolutionary synthesis strategy is leveraged to synthesize more efficient deep neural networks based on this modified macroarchitecture. The resulting SquishedNets possess model sizes ranging from 2.4MB to 0.95MB (~5.17X smaller than SqueezeNet v1.1, or 253X smaller than AlexNet). Furthermore, the SquishedNets are still able to achieve accuracies ranging from 81.2% to 77%, and able to process at speeds of 156 images/sec to as much as 256 images/sec on a Nvidia Jetson TX1 embedded chip. These preliminary results show that a combination of architectural modifications and an evolutionary synthesis strategy can be a useful tool for producing very small deep neural network architectures that are well-suited for edge device scenarios.
- Nov 21 2017 cs.DS arXiv:1711.07454v1We use the Sum of Squares method to develop new efficient algorithms for learning well-separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that substantially improve upon the statistical guarantees achieved by previous efficient algorithms. Firstly, we study mixtures of $k$ distributions in $d$ dimensions, where the means of every pair of distributions are separated by at least $k^{\varepsilon}$. In the special case of spherical Gaussian mixtures, we give a $(dk)^{O(1/\varepsilon^2)}$-time algorithm that learns the means assuming separation at least $k^{\varepsilon}$, for any $\varepsilon > 0$. This is the first algorithm to improve on greedy ("single-linkage") and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation $k^{1/4}$. We also study robust estimation. When an unknown $(1-\varepsilon)$-fraction of $X_1,\ldots,X_n$ are chosen from a sub-Gaussian distribution with mean $\mu$ but the remaining points are chosen adversarially, we give an algorithm recovering $\mu$ to error $\varepsilon^{1-1/t}$ in time $d^{O(t^2)}$, so long as sub-Gaussian-ness up to $O(t)$ moments can be certified by a Sum of Squares proof. This is the first polynomial-time algorithm with guarantees approaching the information-theoretic limit for non-Gaussian distributions. Previous algorithms could not achieve error better than $\varepsilon^{1/2}$. Both of these results are based on a unified technique. Inspired by recent algorithms of Diakonikolas et al. in robust statistics, we devise an SDP based on the Sum of Squares method for the following setting: given $X_1,\ldots,X_n \in \mathbb{R}^d$ for large $d$ and $n = poly(d)$ with the promise that a subset of $X_1,\ldots,X_n$ were sampled from a probability distribution with bounded moments, recover some information about that distribution.
- Data driven research on Android has gained a great momentum these years. The abundance of data facilitates knowledge learning, however, also increases the difficulty of data preprocessing. Therefore, it is non-trivial to prepare a demanding and accurate set of data for research. In this work, we put forward AndroVault, a framework for the Android research composing of data collection, knowledge representation and knowledge extraction. It has started with a long-running web crawler for data collection (both apps and description) since 2013, which guarantees the timeliness of data; With static analysis and dynamic analysis of the collected data, we compute a variety of attributes to characterize Android apps. After that, we employ a knowledge graph to connect all these apps by computing their correlation in terms of attributes; Last, we leverage multiple technologies such as logical inference, machine learning, and correlation analysis to extract facts (more accurate and demanding, either high level or not, data) that are beneficial for a specific research problem. With the produced data of high quality, we have successfully conducted many research works including malware detection, code generation, and Android testing. We would like to release our data to the research community in an authenticated manner, and encourage them to conduct productive research.
- The Morris Water Maze is commonly used in behavioural neuroscience for the study of spatial learning with rodents. Over the years, various methods of analysing rodent data collected in this task have been proposed. These methods span from classical performance measurements (e.g. escape latency, rodent speed, quadrant preference) to more sophisticated methods of categorisation which classify the animal swimming path into behavioural classes known as strategies. Classification techniques provide additional insight in relation to the actual animal behaviours but still only a limited amount of studies utilise them mainly because they highly depend on machine learning knowledge. We have previously demonstrated that the animals implement various strategies and by classifying whole trajectories can lead to the loss of important information. In this work, we developed a generalised and robust classification methodology which implements majority voting to boost the classification performance and successfully nullify the need of manual tuning. Based on this framework, we built a complete software, capable of performing the full analysis described in this paper. The software provides an easy to use graphical user interface (GUI) through which users can enter their trajectory data, segment and label them and finally generate reports and figures of the results.
- Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The traditional solution is to design efficient heuristic algorithms with performance guarantee under certain assumptions. In this paper, we improve a recently proposed job scheduling algorithm using deep reinforcement learning and extend it to multiple server clusters. Our study reveals that deep reinforcement learning method has the potential to outperform traditional resource allocation algorithms in a variety of complicated environments.
- Nov 21 2017 cs.LG arXiv:1711.07437v1Nonnegative Matrix Factorization (NMF) is a widely used technique for data representation. Inspired by the expressive power of deep learning, several NMF variants equipped with deep architectures have been proposed. However, these methods mostly use the only nonnegativity while ignoring task-specific features of data. In this paper, we propose a novel deep approximately orthogonal nonnegative matrix factorization method where both nonnegativity and orthogonality are imposed with the aim to perform a hierarchical clustering by using different level of abstractions of data. Experiment on two face image datasets showed that the proposed method achieved better clustering performance than other deep matrix factorization methods and state-of-the-art single layer NMF variants.
- Nov 21 2017 math.SP arXiv:1711.07435v1Studying the spectral theory of Schroedinger operator on metric graphs (also known as quantum graphs) is advantageous on its own and also as a demonstration of key concepts of general spectral theory. There are some excellent references for this study such as a mathematically oriented book by Berkolaiko and Kuchment, a review with applications to theoretical physicsby Gnutzmann and Smilansky, and elementary lecture notes by Berkolaiko. Here we provide a set of questions and exercises which can accompany the reading of these references or an elementary course on quantum graphs. The exercises are taken from courses on quantum graphs which were taught by the authors.
- Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle models are considered where ambiguity in answering depends on the distance between two points. We show that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle. Experimental results on synthetic and real data show the effectiveness of our approach in overcoming supervision uncertainties and yielding high quality clusters.
- Action recognition is an important yet challenging task in computer vision. In this paper, we propose a novel deep-based framework for action recognition, which improves the recognition accuracy by: 1) deriving more precise features for representing actions, and 2) reducing the asynchrony between different information streams. We first introduce a coarse-to-fine network which extracts shared deep features at different action class granularities and progressively integrates them to obtain a more accurate feature representation for input actions. We further introduce an asynchronous fusion network. It fuses information from different streams by asynchronously integrating stream-wise features at different time points, hence better leveraging the complementary information in different streams. Experimental results on action recognition benchmarks demonstrate that our approach achieves the state-of-the-art performance.
- Nov 21 2017 cs.CV arXiv:1711.07426v12D object detection is the task of finding (i) what objects are present in an image and (ii) where they are located, while 3D pose estimation is the task of finding the pose of these objects in 3D space. State-of-the-art methods for solving these tasks follow a two-stage approach where a 3D pose estimation system is applied to bounding boxes (with associated category labels) returned by a 2D detection method. This paper addresses the task of joint object category and 3D pose estimation given a 2D bounding box. We design a residual network based architecture to solve these two seemingly orthogonal tasks with new category-dependent pose outputs and loss functions, and show state-of-the-art performance on the challenging Pascal3D+ dataset.
- A core aspect of human intelligence is the ability to learn new tasks quickly and switch between them flexibly. Here, we describe a modular continual reinforcement learning paradigm inspired by these abilities. We first introduce a visual interaction environment that allows many types of tasks to be unified in a single framework. We then describe a reward map prediction scheme that learns new tasks robustly in the very large state and action spaces required by such an environment. We investigate how properties of module architecture influence efficiency of task learning, showing that a module motif incorporating specific design principles (e.g. early bottlenecks, low-order polynomial nonlinearities, and symmetry) significantly outperforms more standard neural network motifs, needing fewer training examples and fewer neurons to achieve high levels of performance. Finally, we present a meta-controller architecture for task switching based on a dynamic neural voting scheme, which allows new modules to use information learned from previously-seen tasks to substantially improve their own learning efficiency.
- Nov 21 2017 cs.CV arXiv:1711.07419v1In interactive medical image segmentation, anatomical structures are extracted from reconstructed volumetric images. The first iterations of user interaction traditionally consist of drawing pictorial hints as an initial estimate of the object to extract. Only after this time consuming first phase, the efficient selective refinement of current segmentation results begins. Erroneously labeled seeds, especially near the border of the object, are challenging to detect and replace for a human and may substantially impact the overall segmentation quality. We propose an automatic seeding pipeline as well as a configuration based on saliency recognition, in order to skip the time-consuming initial interaction phase during segmentation. A median Dice score of 68.22% is reached before the first user interaction on the test data set with an error rate in seeding of only 0.088%.
- Evidence suggests that both the interaction of so-called Merkel cells and the epidermal stress distribution play an important role in the formation of fingerprint patterns during pregnancy. To model the formation of fingerprint patterns in a biologically meaningful way these patterns have to become stationary. For the creation of synthetic fingerprints it is also very desirable that rescaling the model parameters leads to rescaled distances between the stationary fingerprint ridges. Based on these observations, as well as the model introduced by Kücken and Champod we propose a new model for the formation of fingerprint patterns during pregnancy. In this anisotropic interaction model the interaction forces not only depend on the distance vector between the cells and the model parameters, but additionally on an underlying tensor field, representing a stress field. This dependence on the tensor field leads to complex, anisotropic patterns. We study the resulting stationary patterns both analytically and numerically. In particular, we show that fingerprint patterns can be modeled as stationary solutions by choosing the underlying tensor field appropriately.
- Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.
- Nov 21 2017 cs.CL arXiv:1711.07404v1One of the most crucial components of natural human-robot interaction is artificial intuition and its influence on dialog systems. The intuitive capability that humans have is undeniably extraordinary, and so remains one of the greatest challenges for natural communicative dialogue between humans and robots. In this paper, we introduce a novel probabilistic modeling framework of identifying, classifying and learning features of sarcastic text via training a neural network with human-informed sarcastic benchmarks. This is necessary for establishing a comprehensive sentiment analysis schema that is sensitive to the nuances of sarcasm-ridden text by being trained on linguistic cues. We show that our model provides a good fit for this type of real-world informed data, with potential to achieve as accurate, if not more, than alternatives. Though the implementation and benchmarking is an extensive task, it can be extended via the same method that we present to capture different forms of nuances in communication and making for much more natural and engaging dialogue systems.
- Nov 21 2017 cs.CV arXiv:1711.07399v1Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
- Nov 21 2017 hep-ph arXiv:1711.07388v1We present the Fortran95 program Recola2 for the perturbative computation of next-to-leading-order transition amplitudes in the Standard Model of particle physics and extended Higgs sectors. New theories are implemented via model files in the 't Hooft-Feynman gauge in the conventional formulation of quantum field theory and in the Background-Field method. The present version includes model files for the Two-Higgs-Doublet Model and the Higgs-Singlet Extension of the Standard Model. We support standard renormalization schemes for the Standard Model as well as many commonly used renormalization schemes in extended Higgs sectors. Within these models the computation of next-to-leading-order polarized amplitudes and squared amplitudes, optionally summed over spin and colour, is fully automated for any process. Recola2 allows the computation of colour- and spin-correlated leading-order squared amplitudes that are needed in the dipole subtraction formalism. Recola2 is publicly available for download at http://recola.hepforge.org.
- Nov 21 2017 cs.CV arXiv:1711.07377v1In this paper, we propose a novel pixel-wise visual object tracking framework that can track any anonymous object in a noisy background. The framework consists of two submodels, a global attention model and a local segmentation model. The global model generates a region of interests (ROI) that the object may lie in the new frame based on the past object segmentation maps, while the local model segments the new image in the ROI. Each model uses a LSTM structure to model the temporal dynamics of the motion and appearance, respectively. To circumvent the dependency of the training data between the two models, we use an iterative update strategy. Once the models are trained, there is no need to refine them to track specific objects, making our method efficient compared to online learning approaches. We demonstrate our real time pixel-wise object tracking framework on a challenging VOT dataset
- Nov 21 2017 quant-ph arXiv:1711.07376v1We propose a noisy quantum analogue of the well known Stuart--Landau equation for weakly nonlinear oscillators. Surprisingly, we find the oscillator's amplitude to be amplified by the very same noise responsible for its stochastic dynamics. This has interesting implications for the theory of linear amplifiers and the so-called quantum van der Pol model used in quantum synchronisation. We then go beyond the weakly nonlinear regime and obtain an exact quantum analogue of the nonlinear oscillator which van der Pol proposed in his 1926 paper.
- We study a classification problem where each feature can be acquired for a cost and the goal is to optimize the trade-off between classification precision and the total feature cost. We frame the problem as a sequential decision-making problem, where we classify one sample in each episode. At each step, an agent can use values of acquired features to decide whether to purchase another one or whether to classify the sample. We use vanilla Double Deep Q-learning, a standard reinforcement learning technique, to find a classification policy. We show that this generic approach outperforms Adapt-Gbrt, currently the best-performing algorithm developed specifically for classification with costly features.
- Neural networks have demonstrated considerable success in a wide variety of real-world problems. However, the presence of adversarial examples - slightly perturbed inputs that are misclassified with high confidence - limits our ability to guarantee performance for these networks in safety-critical applications. We demonstrate that, for networks that are piecewise affine (for example, deep networks with ReLU and maxpool units), proving no adversarial example exists - or finding the closest example if one does exist - can be naturally formulated as solving a mixed integer program. Solves for a fully-connected MNIST classifier with three hidden layers can be completed an order of magnitude faster than those of the best existing approach. To address the concern that adversarial examples are irrelevant because pixel-wise attacks are unlikely to happen in natural images, we search for adversaries over a natural class of perturbations written as convolutions with an adversarial blurring kernel. When searching over blurred images, we find that as opposed to pixelwise attacks, some misclassifications are impossible. Even more interestingly, a small fraction of input images are provably robust to blurs: every blurred version of the input is classified with the same, correct label.
- By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.
- Nov 21 2017 astro-ph.CO hep-ph arXiv:1711.07344v1We show that in the Feebly Interacting Massive Particle (FIMP) model of Dark Matter (DM), one may express the inflationary energy scale $H_*$ as a function of three otherwise unrelated quantities, the DM isocurvature perturbation amplitude, its mass and its self-coupling constant, independently of the tensor-to-scalar ratio. The FIMP model assumes that there exists a real scalar particle that alone constitutes the DM content of the Universe and couples to the Standard Model via a Higgs portal. We consider carefully the various astrophysical, cosmological and model constraints, accounting also for variations in inflationary dynamics and the reheating history, to derive a robust estimate for $H_*$ that is confined to a relatively narrow range. We point out that, within the context of the FIMP DM model, one may thus determine $H_*$ reliably even in the absence of observable tensor perturbations.
- Robots operate in environments with varying implicit structure. For instance, a helicopter flying over terrain encounters a very different arrangement of obstacles than a robotic arm manipulating objects on a cluttered table top. State-of-the-art motion planning systems do not exploit this structure, thereby expending valuable planning effort searching for implausible solutions. We are interested in planning algorithms that actively infer the underlying structure of the valid configuration space during planning in order to find solutions with minimal effort. Consider the problem of evaluating edges on a graph to quickly discover collision-free paths. Evaluating edges is expensive, both for robots with complex geometries like robot arms, and for robots with limited onboard computation like UAVs. Until now, this challenge has been addressed via laziness i.e. deferring edge evaluation until absolutely necessary, with the hope that edges turn out to be valid. However, all edges are not alike in value - some have a lot of potentially good paths flowing through them, and some others encode the likelihood of neighbouring edges being valid. This leads to our key insight - instead of passive laziness, we can actively choose edges that reduce the uncertainty about the validity of paths. We show that this is equivalent to the Bayesian active learning paradigm of decision region determination (DRD). However, the DRD problem is not only combinatorially hard, but also requires explicit enumeration of all possible worlds. We propose a novel framework that combines two DRD algorithms, DIRECT and BISECT, to overcome both issues. We show that our approach outperforms several state-of-the-art algorithms on a spectrum of planning problems for mobile robots, manipulators and autonomous helicopters.
- We analyse how the standard reductions between constraint satisfaction problems affect their proof complexity. We show that, for the most studied propositional, algebraic, and semi-algebraic proof systems, the classical constructions of pp-interpretability, homomorphic equivalence and addition of constants to a core preserve the proof complexity of the CSP. As a result, for those proof systems, the classes of constraint languages for which small unsatisfiability certificates exist can be characterised algebraically. We illustrate our results by a gap theorem saying that a constraint language either has resolution refutations of constant width, or does not have bounded-depth Frege refutations of subexponential size. The former holds exactly for the widely studied class of constraint languages of bounded width. This class is also known to coincide with the class of languages with refutations of sublinear degree in Sums-of-Squares and Polynomial Calculus over the real-field, for which we provide alternative proofs. We then ask for the existence of a natural proof system with good behaviour with respect to reductions and simultaneously small size refutations beyond bounded width. We give an example of such a proof system by showing that bounded-degree Lovász-Schrijver satisfies both requirements. Finally, building on the known lower bounds, we demonstrate the applicability of the method of reducibilities and construct new explicit hard instances of the graph 3-coloring problem for all studied proof systems.
- Nov 21 2017 cs.CV arXiv:1711.07319v1The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
- Nov 21 2017 cond-mat.mes-hall arXiv:1711.07313v1We propose an experimentally friendly scheme for trapping quasi- relativistic elec- trons in graphene by an electromagnetic beam with circular polarization and spatially inhomogeneous profile with an intensity dip. The trapping is achieved due to the ef- fect of bandgap opening outside the trapping region. The proposed mechanism allows for non- invasive electron confinement in graphene without any need of the chemical patterning of the sample or the application of metallic gates.
- Nov 21 2017 cs.CV arXiv:1711.07312v1We develop a Computer Aided Diagnosis (CAD) system, which enhances the performance of dentists in detecting wide range of dental caries. The CAD System achieves this by acting as a second opinion for the dentists with way higher sensitivity on the task of detecting cavities than the dentists themselves. We develop annotated dataset of more than 3000 bitewing radiographs and utilize it for developing a system for automated diagnosis of dental caries. Our system consists of a deep fully convolutional neural network (FCNN) consisting 100+ layers, which is trained to mark caries on bitewing radiographs. We have compared the performance of our proposed system with three certified dentists for marking dental caries. We exceed the average performance of the dentists in both recall (sensitivity) and F1-Score (agreement with truth) by a very large margin. Working example of our system is shown in Figure 1.
- Nov 21 2017 cs.MM arXiv:1711.07306v1Deep learning based image steganalysis has attracted increasing attentions in recent years. Several Convolutional Neural Network (CNN) models have been proposed and achieved state-of-the-art performances on detecting steganography. In this paper, we explore an important technique in deep learning, the batch normalization, for the task of image steganalysis. Different from natural image classification, steganalysis is to discriminate cover images and stego images which are the result of adding weak stego signals into covers. This characteristic makes a cover image is more statistically similar to its stego than other cover images, requiring steganalytic methods to use paired learning to extract effective features for image steganalysis. Our theoretical analysis shows that a CNN model with multiple normalization layers is hard to be generalized to new data in the test set when it is well trained with paired learning. To hand this difficulty, we propose a novel normalization technique called Shared Normalization (SN) in this paper. Unlike the batch normalization layer utilizing the mini-batch mean and standard deviation to normalize each input batch, SN shares same statistics for all training and test batches. Based on the proposed SN layer, we further propose a novel neural network model for image steganalysis. Extensive experiments demonstrate that the proposed network with SN layers is stable and can detect the state of the art steganography with better performances than previous methods.
- This paper describes autonomous racing of RC race cars based on mathematical optimization. Using a dynamical model of the vehicle, control inputs are computed by receding horizon based controllers, where the objective is to maximize progress on the track subject to the requirement of staying on the track and avoiding opponents. Two different control formulations are presented. The first controller employs a two-level structure, consisting of a path planner and a nonlinear model predictive controller (NMPC) for tracking. The second controller combines both tasks in one nonlinear optimization problem (NLP) following the ideas of contouring control. Linear time varying models obtained by linearization are used to build local approximations of the control NLPs in the form of convex quadratic programs (QPs) at each sampling time. The resulting QPs have a typical MPC structure and can be solved in the range of milliseconds by recent structure exploiting solvers, which is key to the real-time feasibility of the overall control scheme. Obstacle avoidance is incorporated by means of a high-level corridor planner based on dynamic programming, which generates convex constraints for the controllers according to the current position of opponents and the track layout. The control performance is investigated experimentally using 1:43 scale RC race cars, driven at speeds of more than 3 m/s and in operating regions with saturated rear tire forces (drifting). The algorithms run at 50 Hz sampling rate on embedded computing platforms, demonstrating the real-time feasibility and high performance of optimization-based approaches for autonomous racing.
- We study a noncommutative analogue of a space(time) foliated by (spacelike) hypersurfaces. First, in the classical (commutative) case, we show that the canonical Dirac operator on the total space(time) (with either Riemannian or Lorentzian signature) can be reconstructed from the family of Dirac operators on the (spacelike) hypersurfaces. Second, in the noncommutative case, the same construction continues to make sense for an abstract family of spectral triples, and we prove that (in the case of Riemannian signature) the construction yields in fact a spectral triple, which we call a product spectral triple. In the case of Lorentzian signature, the corresponding 'Lorentzian spectral triple' can also be viewed as the 'reverse Wick rotation' of such product spectral triples. This construction of 'Lorentzian spectral triples' fits well into the Krein space approach to noncommutative Lorentzian geometry.
- We introduce and study the notion of conic stability of multivariate complex polynomials in $\mathbb{C}[z_1,\ldots, z_n]$, which naturally generalizes the stability of multivariate polynomials. In particular, we generalize Borcea's and Brändén's multivariate version of the Hermite-Kakeya-Obreschkoff Theorem to the conic stability and provide a characterization in terms of a directional Wronskian. And we generalize a major criterion for stability of determinantal polynomials to stability with respect to the positive semidefinite cone.
- The Exact Set Similarity Join problem aims to find all similar sets between two collections of sets, with respect to a threshold and a similarity function such as overlap, Jaccard, dice or cosine. The naive approach verifies all pairs of sets and it is often considered impractical due the high number of combinations. So, Exact Set Similarity Join algorithms are usually based on the Filter-Verification Framework, that applies a series of filters to reduce the number of verified pairs. This paper presents a new filtering technique called Bitmap Filter, which is able to accelerate state-of-the-art algorithms for the exact Set Similarity Join problem. The Bitmap Filter uses hash functions to create bitmaps of fixed b bits, representing characteristics of the sets. Then, it applies bitwise operations (such as xor and population count) on the bitmaps in order to infer a similarity upper bound for each pair of sets. If the upper bound is below a given similarity threshold, the pair of sets is pruned. The Bitmap Filter benefits from the fact that bitwise operations are efficiently implemented by many modern general-purpose processors and it was easily applied to four state-of-the-art algorithms implemented in CPU: AllPairs, PPJoin, AdaptJoin and GroupJoin. Furthermore, we propose a Graphic Processor Unit (GPU) algorithm based on the naive approach but using the Bitmap Filter to speedup the computation. The experiments considered 9 collections containing from 100 thousands up to 10 million sets and the joins were made using Jaccard thresholds from 0.50 to 0.95. The Bitmap Filter was able to improve 90% of the experiments in CPU, with speedups of up to 4.50x and 1.43x on average. Using the GPU algorithm, the experiments were able to speedup the original CPU algorithms by up to 577x using an Nvidia Geforce GTX 980 Ti.
- In many machine learning tasks it is desirable that a model's prediction transforms in an equivariant way under transformations of its input. Convolutional neural networks (CNNs) implement translational equivariance by construction; for other transformations, however, they are compelled to learn the proper mapping. In this work, we develop Steerable Filter CNNs (SFCNNs) which achieve joint equivariance under translations and rotations by design. The proposed architecture employs steerable filters to efficiently compute orientation dependent responses for many orientations without suffering interpolation artifacts from filter rotation. We utilize group convolutions which guarantee an equivariant mapping. In addition, we generalize He's weight initialization scheme to filters which are defined as a linear combination of a system of atomic filters. Numerical experiments show a substantial enhancement of the sample complexity with a growing number of sampled filter orientations and confirm that the network generalizes learned patterns over orientations. The proposed approach achieves state-of-the-art on the rotated MNIST benchmark and on the ISBI 2012 2D EM segmentation challenge.
- This note concerns a somewhat innocent question motivated by an observation concerning the use of Chebyshev bounds on sample estimates of $p$ in the binomial distribution with parameters $n,p$. Namely, what moment order produces the best Chebyshev estimate of $p$? If $S_n(p)$ has a binomial distribution with parameters $n,p$, there it is readily observed that ${\rm argmax}_{0\le p\le 1}{\mathbb E}S_n^2(p) = {\rm argmax}_{0\le p\le 1}np(1-p) = \frac12,$ and ${\mathbb E}S_n^2(\frac12) = \frac{n}{4}$. Rabi Bhattacharya observed that while the second moment Chebyshev sample size for a $95\%$ confidence estimate within $\pm 5$ percentage points is $n = 2000$, the fourth moment yields the substantially reduced polling requirement of $n = 775$. Why stop at fourth moment? Is the argmax achieved at $p = \frac12$ for higher order moments and, if so, does it help, and compute $\mathbb{E}S_n^{2m}(\frac12)$? As captured by the title of this note, answers to these questions lead to a simple rule of thumb for best choice of moments in terms of an effective sample size for Chebyshev concentration inequalities.
- A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.
- In this paper we document our experiences with developing speech recognition for medical transcription - a system that automatically transcribes doctor-patient conversations. Towards this goal, we built a system along two different methodological lines - a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model. To train these models we used a corpus of anonymized conversations representing approximately 14,000 hours of speech. Because of noisy transcripts and alignments in the corpus, a significant amount of effort was invested in data cleaning issues. We describe a two-stage strategy we followed for segmenting the data. The data cleanup and development of a matched language model was essential to the success of the CTC based models. The LAS based models, however were found to be resilient to alignment and transcript noise and did not require the use of language models. CTC models were able to achieve a word error rate of 20.1%, and the LAS models were able to achieve 18.3%. Our analysis shows that both models perform well on important medical utterances and therefore can be practical for transcribing medical conversations.
- Nov 21 2017 cs.AI arXiv:1711.07273v1There are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are described and defined in terms of the latter and used to manage the polyhierarchy of the self-standing entities. Fully programmatic development is a technique where an ontology is developed using a domain-specific language within a programming language, meaning that as well defining ontological entities, it is possible to add arbitrary patterns or new syntax within the same environment. We describe how new patterns can be used to enable a new style of ontology development that we call hypernormalisation.