- Nov 20 2017 quant-ph arXiv:1711.06658v1What is the energy cost of extracting entanglement from complex quantum systems? In other words, given a state of a quantum system, how much energy does it cost to extract m EPR pairs? This is an important question, particularly for quantum field theories where the vacuum is generally highly entangled. Here we build a theory to understand the energy cost of entanglement extraction. First, we apply it to a toy model, and then we define the entanglement temperature, which relates the energy cost to the amount of extracted entanglement. Next, we give a physical argument to find the energy cost of entanglement extraction in some condensed matter and quantum field systems. The energy cost for those quantum field theories depends on the spatial dimension, and in one dimension, for example, it grows exponentially with the number of EPR pairs extracted. Next, we outline some approaches for bounding the energy cost of extracting entanglement in general quantum systems. Finally, we look at the antiferromagnetic Heisenberg and transverse field Ising models numerically to calculate the entanglement temperature using matrix product states.
- Nov 20 2017 cond-mat.str-el quant-ph arXiv:1711.06559v1We prove that ground states of gapped local Hamiltonians on an infinite spin chain can be efficiently approximated by matrix product states with a bond dimension which scales as D~(L-1)/epsilon, where any local quantity on L consecutive spins is approximated to accuracy epsilon.
- Security for machine learning has begun to become a serious issue for present day applications. An important question remaining is whether emerging quantum technologies will help or hinder the security of machine learning. Here we discuss a number of ways that quantum information can be used to help make quantum classifiers more secure or private. In particular, we demonstrate a form of robust principal component analysis that, under some circumstances, can provide an exponential speedup relative to robust methods used at present. To demonstrate this approach we introduce a linear combinations of unitaries Hamiltonian simulation method that we show functions when given an imprecise Hamiltonian oracle, which may be of independent interest. We also introduce a new quantum approach for bagging and boosting that can use quantum superposition over the classifiers or splits of the training set to aggregate over many more models than would be possible classically. Finally, we provide a private form of $k$--means clustering that can be used to prevent an all powerful adversary from learning more than a small fraction of a bit from any user. These examples show the role that quantum technologies can play in the security of ML and vice versa. This illustrates that quantum computing can provide useful advantages to machine learning apart from speedups.
- Nov 20 2017 quant-ph arXiv:1711.06592v1We propose to perform quantum key distribution using quantum correlations occurring within thermal states produced by low power sources such as LED's. These correlations are exploited through the Hanbury Brown and Twiss effect. We build an optical central broadcast protocol using a superluminescent diode which allows switching between laser and thermal regimes, enabling us to provide experimental key rates in both regimes. We provide a theoretical analysis and show that quantum secrecy is possible, even in high noise situations.
- Nov 20 2017 quant-ph cond-mat.mes-hall arXiv:1711.06662v1We analyze a modified Bose-Hubbard model, where two cavities having on-site Kerr interactions are subject to two-photon driving and correlated dissipation. We derive an exact solution for the steady state of this interacting driven-dissipative system, and use it show that the system permits the preparation and stabilization of pure entangled non-Gaussian states, so-called entangled cat states. Unlike previous proposals for dissipative stabilization of such states, our approach requires only a linear coupling to a single engineered reservoir (as opposed to nonlinear couplings to two or more reservoirs). Our scheme is within the reach of state-of-the-art experiments in circuit QED.
- We propose a reduction for non-convex optimization that can (1) turn a stationary-point finding algorithm into a local-minimum finding one, and (2) replace the Hessian-vector product computations with only gradient computations. It works both in the stochastic and the deterministic settings, without hurting the algorithm's performance. As applications, our reduction turns Natasha2 into a first-order method without hurting its performance. It also converts SGD, GD, SCSG, and SVRG into local-minimum finding algorithms outperforming some best known results.
- Machine learning systems, which are often used for high-stakes decisions, suffer from two mutually reinforcing problems: unfairness and opaqueness. Many popular models, although generally accurate, cannot express uncertainty about their predictions. Even in regimes where a model is inaccurate, users may trust the model's predictions too fully, and allow its biases to reinforce the user's own. In this work, we explore models that learn to defer. In our scheme, a model learns to classify accurately and fairly, but also to defer if necessary, passing judgment to a downstream decision-maker such as a human user. We further propose a learning algorithm which accounts for potential biases held by decision-makers later in a pipeline. Experiments on real-world datasets demonstrate that learning to defer can make a model not only more accurate but also less biased. Even when operated by highly biased users, we show that deferring models can still greatly improve the fairness of the entire pipeline.
- Nov 20 2017 cs.CV arXiv:1711.06636v1We explore encoding brain symmetry into a neural network for a brain tumor segmentation task. A healthy human brain is symmetric at a high level of abstraction, and the high-level asymmetric parts are more likely to be tumor regions. Paying more attention to asymmetries has the potential to boost the performance in brain tumor segmentation. We propose a method to encode brain symmetry into existing neural networks and apply the method to a state-of-the-art neural network for medical imaging segmentation. We evaluate our symmetry-encoded network on the dataset from a brain tumor segmentation challenge and verify that the new model extracts information in the training images more efficiently than the original model.
- We present an experimental study on the non-equilibrium tunnel dynamics of two coupled one-dimensional Bose-Einstein quasi-condensates deep in the Josephson regime. Josephson oscillations are initiated by splitting a single one-dimensional condensate and imprinting a relative phase between the superfluids. Regardless of the initial state and experimental parameters, the dynamics of the relative phase and atom number imbalance shows a relaxation to a phase-locked steady state. The latter is characterized by a high phase coherence and reduced fluctuations with respect to the initial state. We propose an empirical model based on the analogy with the anharmonic oscillator to describe the effect of various experimental parameters. A microscopic theory compatible with our observations is still missing.
- Machine learning models are vulnerable to adversarial examples: minor, in many cases imperceptible, perturbations to classification inputs. Among other suspected causes, adversarial examples exploit ML models that offer no well-defined indication as to how well a particular prediction is supported by training data, yet are forced to confidently extrapolate predictions in areas of high entropy. In contrast, Bayesian ML models, such as Gaussian Processes (GP), inherently model the uncertainty accompanying a prediction in the well-studied framework of Bayesian Inference. This paper is first to explore adversarial examples and their impact on uncertainty estimates for Gaussian Processes. To this end, we first present three novel attacks on Gaussian Processes: GPJM and GPFGS exploit forward derivatives in GP latent functions, and Latent Space Approximation Networks mimic the latent space representation in unsupervised GP models to facilitate attacks. Further, we show that these new attacks compute adversarial examples that transfer to non-GP classification models, and vice versa. Finally, we show that GP uncertainty estimates not only differ between adversarial examples and benign data, but also between adversarial examples computed by different algorithms.
- Nov 20 2017 stat.ML arXiv:1711.06494v1Compression of Neural Networks (NN) has become a highly studied topic in recent years. The main reason for this is the demand for industrial scale usage of NNs such as deploying them on mobile devices, storing them efficiently, transmitting them via band-limited channels and most importantly doing inference at scale. In this work, we propose to join the Soft-Weight Sharing and Variational Dropout approaches that show strong results to define a new state-of-the-art in terms of model compression.
- We use deep feedforward artificial neural networks to approximate solutions of partial differential equations of advection and diffusion type in complex geometries. We derive analytical expressions of the gradients of the cost function with respect to the network parameters, as well as the gradient of the network itself with respect to the input, for arbitrarily deep networks. The method is based on an ansatz for the solution, which requires nothing but feedforward neural networks, and an unconstrained gradient based optimization method such as gradient descent or quasi-Newton methods. We provide detailed examples on how to use deep feedforward neural networks as a basis for further work on deep neural network approximations to partial differential equations. We highlight the benefits of deep compared to shallow neural networks and other convergence enhancing techniques.
- Given a tensor $f$ in a Euclidean tensor space, we are interested in the critical points of the distance function from $f$ to the set of tensors of rank at most $k$, which we call the critical rank-at-most-$k$ tensors for $f$. When $f$ is a matrix, the critical rank-one matrices for $f$ correspond to the singular pairs of $f$. The critical rank-one tensors for $f$ lie in a linear subspace $H_f$, the critical space of $f$. Our main result is that, for any $k$, the critical rank-at-most-$k$ tensors for a sufficiently general $f$ also lie in the critical space $H_f$. This is the part of Eckart-Young Theorem that generalizes from matrices to tensors. Moreover, we show that when the tensor format satisfies the triangle inequalities, the critical space $H_f$ is spanned by the complex critical rank-one tensors. Since $f$ itself belongs to $H_f$, we deduce that also $f$ itself is a linear combination of its critical rank-one tensors.
- Nov 20 2017 cs.CV arXiv:1711.06439v1Pancreas segmentation in computed tomography imaging has been historically difficult for automated methods because of the large shape and size variations between patients. In this work, we describe a custom-build 3D fully convolutional network (FCN) that can process a 3D image including the whole pancreas and produce an automatic segmentation. We investigate two variations of the 3D FCN architecture; one with concatenation and one with summation skip connections to the decoder part of the network. We evaluate our methods on a dataset from a clinical trial with gastric cancer patients, including 147 contrast enhanced abdominal CT scans acquired in the portal venous phase. Using the summation architecture, we achieve an average Dice score of 89.7 $\pm$ 3.8 (range [79.8, 94.8]) % in testing, achieving the new state-of-the-art performance in pancreas segmentation on this dataset.
- Nov 20 2017 cs.CV arXiv:1711.06396v1Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.
- We develop a set of methods to improve on the results of self-supervised learning using context. We start with a baseline of patch based arrangement context learning and go from there. Our methods address some overt problems such as chromatic aberration as well as other potential problems such as spatial skew and mid-level feature neglect. We prevent problems with testing generalization on common self-supervised benchmark tests by using different datasets during our development. The results of our methods combined yield top scores on all standard self-supervised benchmarks, including classification and detection on PASCAL VOC 2007, segmentation on PASCAL VOC 2012, and "linear tests" on the ImageNet and CSAIL Places datasets. We obtain an improvement over our baseline method of between 4.0 to 7.1 percentage points on transfer learning classification tests. We also show results on different standard network architectures to demonstrate generalization as well as portability.
- The detection of gravitational waves with LIGO and Virgo requires a detailed understanding of the response of these instruments in the presence of environmental and instrumental noise. Of particular interest is the study of non-Gaussian noise transients known as glitches, since their high occurrence rate in LIGO/Virgo data can obscure or even mimic true gravitational wave signals. Therefore, successfully identifying and excising glitches is of utmost importance to detect and characterize gravitational waves. In this article, we present the first application of Deep Learning combined with Transfer Learning for glitch classification, using real data from LIGO's first discovery campaign labeled by Gravity Spy, showing that knowledge from pre-trained models for real-world object recognition can be transferred for classifying spectrograms of glitches. We demonstrate that this method enables the optimal use of very deep convolutional neural networks for glitch classification given small unbalanced training datasets, significantly reduces the training time, and achieves state-of-the-art accuracy above 98.8%. Once trained via transfer learning, we show that the networks can be truncated and used as feature extractors for unsupervised clustering to automatically group new classes of glitches. This feature is of critical importance to identify and remove new types of glitches which will occur as the LIGO/Virgo detectors gradually attain design sensitivity.
- Nov 21 2017 cs.LG arXiv:1711.07465v1We develop a new family of convex relaxations for $k$-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms. We then prove a sharp upper bound on the sum-of-squares norms for moment tensors of any distribution that satisfies the \emphPoincare inequality. The Poincare inequality is a central inequality in probability theory, and a large class of distributions satisfy it including Gaussians, product distributions, strongly log-concave distributions, and any sum or uniformly continuous transformation of such distributions. As an immediate corollary, for any $\gamma > 0$, we obtain an efficient algorithm for learning the means of a mixture of $k$ arbitrary \Poincare distributions in $\mathbb{R}^d$ in time $d^{O(1/\gamma)}$ so long as the means have separation $\Omega(k^{\gamma})$. This in particular yields an algorithm for learning Gaussian mixtures with separation $\Omega(k^{\gamma})$, thus partially resolving an open problem of Regev and Vijayaraghavan \citetregev2017learning. Our algorithm works even in the outlier-robust setting where an $\epsilon$ fraction of arbitrary outliers are added to the data, as long as the fraction of outliers is smaller than the smallest cluster. We, therefore, obtain results in the strong agnostic setting where, in addition to not knowing the distribution family, the data itself may be arbitrarily corrupted.
- Brain-Machine Interaction (BMI) system motivates interesting and promising results in forward/feedback control consistent with human intention. It holds great promise for advancements in patient care and applications to neurorehabilitation. Here, we propose a novel neurofeedback-based BCI robotic platform using a personalized social robot in order to assist patients having cognitive deficits through bilateral rehabilitation and mental training. For initial testing of the platform, electroencephalography (EEG) brainwaves of a human user were collected in real time during tasks of imaginary movements. First, the brainwaves associated with imagined body kinematics parameters were decoded to control a cursor on a computer screen in training protocol. Then, the experienced subject was able to interact with a social robot via our real-time BMI robotic platform. Corresponding to subject's imagery performance, he/she received specific gesture movements and eye color changes as neural-based feedback from the robot. This hands-free neurofeedback interaction not only can be used for mind control of a social robot's movements, but also sets the stage for application to enhancing and recovering mental abilities such as attention via training in humans by providing real-time neurofeedback from a social robot.
- Conditional variants of Generative Adversarial Networks (GANs), known as cGANs, are generative models that can produce data samples ($x$) conditioned on both latent variables ($z$) and known auxiliary information ($c$). Another GAN variant, Bidirectional GAN (BiGAN) is a recently developed framework for learning the inverse mapping from $x$ to $z$ through an encoder trained simultaneously with the generator and the discriminator of an unconditional GAN. We propose the Bidirectional Conditional GAN (BCGAN), which combines cGANs and BiGANs into a single framework with an encoder that learns inverse mappings from $x$ to both $z$ and $c$, trained simultaneously with the conditional generator and discriminator in an end-to-end setting. We present crucial techniques for training BCGANs, which incorporate an extrinsic factor loss along with an associated dynamically-tuned importance weight. As compared to other encoder-based GANs, BCGANs not only encode $c$ more accurately but also utilize $z$ and $c$ more effectively and in a more disentangled way to generate data samples.
- While deep neural networks have been shown in recent years to outperform other machine learning methods in a wide range of applications, one of the biggest challenges with enabling deep neural networks for widespread deployment on edge devices such as mobile and other consumer devices is high computational and memory requirements. Recently, there has been greater exploration into small deep neural network architectures that are more suitable for edge devices, with one of the most popular architectures being SqueezeNet, with an incredibly small model size of 4.8MB. Taking further advantage of the notion that many applications of machine learning on edge devices are often characterized by a low number of target classes, this study explores the utility of combining architectural modifications and an evolutionary synthesis strategy for synthesizing even smaller deep neural architectures based on the more recent SqueezeNet v1.1 macroarchitecture for applications with fewer target classes. In particular, architectural modifications are first made to SqueezeNet v1.1 to accommodate for a 10-class ImageNet-10 dataset, and then an evolutionary synthesis strategy is leveraged to synthesize more efficient deep neural networks based on this modified macroarchitecture. The resulting SquishedNets possess model sizes ranging from 2.4MB to 0.95MB (~5.17X smaller than SqueezeNet v1.1, or 253X smaller than AlexNet). Furthermore, the SquishedNets are still able to achieve accuracies ranging from 81.2% to 77%, and able to process at speeds of 156 images/sec to as much as 256 images/sec on a Nvidia Jetson TX1 embedded chip. These preliminary results show that a combination of architectural modifications and an evolutionary synthesis strategy can be a useful tool for producing very small deep neural network architectures that are well-suited for edge device scenarios.
- Nov 21 2017 cs.DS arXiv:1711.07454v1We use the Sum of Squares method to develop new efficient algorithms for learning well-separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that substantially improve upon the statistical guarantees achieved by previous efficient algorithms. Firstly, we study mixtures of $k$ distributions in $d$ dimensions, where the means of every pair of distributions are separated by at least $k^{\varepsilon}$. In the special case of spherical Gaussian mixtures, we give a $(dk)^{O(1/\varepsilon^2)}$-time algorithm that learns the means assuming separation at least $k^{\varepsilon}$, for any $\varepsilon > 0$. This is the first algorithm to improve on greedy ("single-linkage") and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation $k^{1/4}$. We also study robust estimation. When an unknown $(1-\varepsilon)$-fraction of $X_1,\ldots,X_n$ are chosen from a sub-Gaussian distribution with mean $\mu$ but the remaining points are chosen adversarially, we give an algorithm recovering $\mu$ to error $\varepsilon^{1-1/t}$ in time $d^{O(t^2)}$, so long as sub-Gaussian-ness up to $O(t)$ moments can be certified by a Sum of Squares proof. This is the first polynomial-time algorithm with guarantees approaching the information-theoretic limit for non-Gaussian distributions. Previous algorithms could not achieve error better than $\varepsilon^{1/2}$. Both of these results are based on a unified technique. Inspired by recent algorithms of Diakonikolas et al. in robust statistics, we devise an SDP based on the Sum of Squares method for the following setting: given $X_1,\ldots,X_n \in \mathbb{R}^d$ for large $d$ and $n = poly(d)$ with the promise that a subset of $X_1,\ldots,X_n$ were sampled from a probability distribution with bounded moments, recover some information about that distribution.
- Data driven research on Android has gained a great momentum these years. The abundance of data facilitates knowledge learning, however, also increases the difficulty of data preprocessing. Therefore, it is non-trivial to prepare a demanding and accurate set of data for research. In this work, we put forward AndroVault, a framework for the Android research composing of data collection, knowledge representation and knowledge extraction. It has started with a long-running web crawler for data collection (both apps and description) since 2013, which guarantees the timeliness of data; With static analysis and dynamic analysis of the collected data, we compute a variety of attributes to characterize Android apps. After that, we employ a knowledge graph to connect all these apps by computing their correlation in terms of attributes; Last, we leverage multiple technologies such as logical inference, machine learning, and correlation analysis to extract facts (more accurate and demanding, either high level or not, data) that are beneficial for a specific research problem. With the produced data of high quality, we have successfully conducted many research works including malware detection, code generation, and Android testing. We would like to release our data to the research community in an authenticated manner, and encourage them to conduct productive research.
- The Morris Water Maze is commonly used in behavioural neuroscience for the study of spatial learning with rodents. Over the years, various methods of analysing rodent data collected in this task have been proposed. These methods span from classical performance measurements (e.g. escape latency, rodent speed, quadrant preference) to more sophisticated methods of categorisation which classify the animal swimming path into behavioural classes known as strategies. Classification techniques provide additional insight in relation to the actual animal behaviours but still only a limited amount of studies utilise them mainly because they highly depend on machine learning knowledge. We have previously demonstrated that the animals implement various strategies and by classifying whole trajectories can lead to the loss of important information. In this work, we developed a generalised and robust classification methodology which implements majority voting to boost the classification performance and successfully nullify the need of manual tuning. Based on this framework, we built a complete software, capable of performing the full analysis described in this paper. The software provides an easy to use graphical user interface (GUI) through which users can enter their trajectory data, segment and label them and finally generate reports and figures of the results.
- Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The traditional solution is to design efficient heuristic algorithms with performance guarantee under certain assumptions. In this paper, we improve a recently proposed job scheduling algorithm using deep reinforcement learning and extend it to multiple server clusters. Our study reveals that deep reinforcement learning method has the potential to outperform traditional resource allocation algorithms in a variety of complicated environments.
- Nov 21 2017 cs.LG arXiv:1711.07437v1Nonnegative Matrix Factorization (NMF) is a widely used technique for data representation. Inspired by the expressive power of deep learning, several NMF variants equipped with deep architectures have been proposed. However, these methods mostly use the only nonnegativity while ignoring task-specific features of data. In this paper, we propose a novel deep approximately orthogonal nonnegative matrix factorization method where both nonnegativity and orthogonality are imposed with the aim to perform a hierarchical clustering by using different level of abstractions of data. Experiment on two face image datasets showed that the proposed method achieved better clustering performance than other deep matrix factorization methods and state-of-the-art single layer NMF variants.
- Nov 21 2017 math.SP arXiv:1711.07435v1Studying the spectral theory of Schroedinger operator on metric graphs (also known as quantum graphs) is advantageous on its own and also as a demonstration of key concepts of general spectral theory. There are some excellent references for this study such as a mathematically oriented book by Berkolaiko and Kuchment, a review with applications to theoretical physicsby Gnutzmann and Smilansky, and elementary lecture notes by Berkolaiko. Here we provide a set of questions and exercises which can accompany the reading of these references or an elementary course on quantum graphs. The exercises are taken from courses on quantum graphs which were taught by the authors.
- Pairwise "same-cluster" queries are one of the most widely used forms of supervision in semi-supervised clustering. However, it is impractical to ask human oracles to answer every query correctly. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose an effective algorithm to handle such uncertainties in query responses. Two realistic weak oracle models are considered where ambiguity in answering depends on the distance between two points. We show that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle. Experimental results on synthetic and real data show the effectiveness of our approach in overcoming supervision uncertainties and yielding high quality clusters.
- Action recognition is an important yet challenging task in computer vision. In this paper, we propose a novel deep-based framework for action recognition, which improves the recognition accuracy by: 1) deriving more precise features for representing actions, and 2) reducing the asynchrony between different information streams. We first introduce a coarse-to-fine network which extracts shared deep features at different action class granularities and progressively integrates them to obtain a more accurate feature representation for input actions. We further introduce an asynchronous fusion network. It fuses information from different streams by asynchronously integrating stream-wise features at different time points, hence better leveraging the complementary information in different streams. Experimental results on action recognition benchmarks demonstrate that our approach achieves the state-of-the-art performance.
- Nov 21 2017 cs.CV arXiv:1711.07426v12D object detection is the task of finding (i) what objects are present in an image and (ii) where they are located, while 3D pose estimation is the task of finding the pose of these objects in 3D space. State-of-the-art methods for solving these tasks follow a two-stage approach where a 3D pose estimation system is applied to bounding boxes (with associated category labels) returned by a 2D detection method. This paper addresses the task of joint object category and 3D pose estimation given a 2D bounding box. We design a residual network based architecture to solve these two seemingly orthogonal tasks with new category-dependent pose outputs and loss functions, and show state-of-the-art performance on the challenging Pascal3D+ dataset.
- A core aspect of human intelligence is the ability to learn new tasks quickly and switch between them flexibly. Here, we describe a modular continual reinforcement learning paradigm inspired by these abilities. We first introduce a visual interaction environment that allows many types of tasks to be unified in a single framework. We then describe a reward map prediction scheme that learns new tasks robustly in the very large state and action spaces required by such an environment. We investigate how properties of module architecture influence efficiency of task learning, showing that a module motif incorporating specific design principles (e.g. early bottlenecks, low-order polynomial nonlinearities, and symmetry) significantly outperforms more standard neural network motifs, needing fewer training examples and fewer neurons to achieve high levels of performance. Finally, we present a meta-controller architecture for task switching based on a dynamic neural voting scheme, which allows new modules to use information learned from previously-seen tasks to substantially improve their own learning efficiency.
- Nov 21 2017 cs.CV arXiv:1711.07419v1In interactive medical image segmentation, anatomical structures are extracted from reconstructed volumetric images. The first iterations of user interaction traditionally consist of drawing pictorial hints as an initial estimate of the object to extract. Only after this time consuming first phase, the efficient selective refinement of current segmentation results begins. Erroneously labeled seeds, especially near the border of the object, are challenging to detect and replace for a human and may substantially impact the overall segmentation quality. We propose an automatic seeding pipeline as well as a configuration based on saliency recognition, in order to skip the time-consuming initial interaction phase during segmentation. A median Dice score of 68.22% is reached before the first user interaction on the test data set with an error rate in seeding of only 0.088%.
- Evidence suggests that both the interaction of so-called Merkel cells and the epidermal stress distribution play an important role in the formation of fingerprint patterns during pregnancy. To model the formation of fingerprint patterns in a biologically meaningful way these patterns have to become stationary. For the creation of synthetic fingerprints it is also very desirable that rescaling the model parameters leads to rescaled distances between the stationary fingerprint ridges. Based on these observations, as well as the model introduced by Kücken and Champod we propose a new model for the formation of fingerprint patterns during pregnancy. In this anisotropic interaction model the interaction forces not only depend on the distance vector between the cells and the model parameters, but additionally on an underlying tensor field, representing a stress field. This dependence on the tensor field leads to complex, anisotropic patterns. We study the resulting stationary patterns both analytically and numerically. In particular, we show that fingerprint patterns can be modeled as stationary solutions by choosing the underlying tensor field appropriately.
- Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.
- Nov 21 2017 cs.CL arXiv:1711.07404v1One of the most crucial components of natural human-robot interaction is artificial intuition and its influence on dialog systems. The intuitive capability that humans have is undeniably extraordinary, and so remains one of the greatest challenges for natural communicative dialogue between humans and robots. In this paper, we introduce a novel probabilistic modeling framework of identifying, classifying and learning features of sarcastic text via training a neural network with human-informed sarcastic benchmarks. This is necessary for establishing a comprehensive sentiment analysis schema that is sensitive to the nuances of sarcasm-ridden text by being trained on linguistic cues. We show that our model provides a good fit for this type of real-world informed data, with potential to achieve as accurate, if not more, than alternatives. Though the implementation and benchmarking is an extensive task, it can be extended via the same method that we present to capture different forms of nuances in communication and making for much more natural and engaging dialogue systems.
- Nov 21 2017 cs.CV arXiv:1711.07399v1Most of the existing deep learning-based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints, such as hand or human body joints, via 2D convolutional neural networks (CNNs). The first weakness of this approach is the presence of perspective distortion in the 2D depth map. While the depth map is intrinsically 3D data, many previous methods treat depth maps as 2D images that can distort the shape of the actual object through projection from 3D to 2D space. This compels the network to perform perspective distortion-invariant estimation. The second weakness of the conventional approach is that directly regressing 3D coordinates from a 2D image is a highly non-linear mapping, which causes difficulty in the learning procedure. To overcome these weaknesses, we firstly cast the 3D hand and human pose estimation problem from a single depth map into a voxel-to-voxel prediction that uses a 3D voxelized grid and estimates the per-voxel likelihood for each keypoint. We design our model as a 3D CNN that provides accurate estimates while running in real-time. Our system outperforms previous methods in almost all publicly available 3D hand and human pose estimation datasets and placed first in the HANDS 2017 frame-based 3D hand pose estimation challenge.
- Nov 21 2017 hep-ph arXiv:1711.07388v1We present the Fortran95 program Recola2 for the perturbative computation of next-to-leading-order transition amplitudes in the Standard Model of particle physics and extended Higgs sectors. New theories are implemented via model files in the 't Hooft-Feynman gauge in the conventional formulation of quantum field theory and in the Background-Field method. The present version includes model files for the Two-Higgs-Doublet Model and the Higgs-Singlet Extension of the Standard Model. We support standard renormalization schemes for the Standard Model as well as many commonly used renormalization schemes in extended Higgs sectors. Within these models the computation of next-to-leading-order polarized amplitudes and squared amplitudes, optionally summed over spin and colour, is fully automated for any process. Recola2 allows the computation of colour- and spin-correlated leading-order squared amplitudes that are needed in the dipole subtraction formalism. Recola2 is publicly available for download at http://recola.hepforge.org.
- Nov 21 2017 cs.CV arXiv:1711.07377v1In this paper, we propose a novel pixel-wise visual object tracking framework that can track any anonymous object in a noisy background. The framework consists of two submodels, a global attention model and a local segmentation model. The global model generates a region of interests (ROI) that the object may lie in the new frame based on the past object segmentation maps, while the local model segments the new image in the ROI. Each model uses a LSTM structure to model the temporal dynamics of the motion and appearance, respectively. To circumvent the dependency of the training data between the two models, we use an iterative update strategy. Once the models are trained, there is no need to refine them to track specific objects, making our method efficient compared to online learning approaches. We demonstrate our real time pixel-wise object tracking framework on a challenging VOT dataset
- We study a classification problem where each feature can be acquired for a cost and the goal is to optimize the trade-off between classification precision and the total feature cost. We frame the problem as a sequential decision-making problem, where we classify one sample in each episode. At each step, an agent can use values of acquired features to decide whether to purchase another one or whether to classify the sample. We use vanilla Double Deep Q-learning, a standard reinforcement learning technique, to find a classification policy. We show that this generic approach outperforms Adapt-Gbrt, currently the best-performing algorithm developed specifically for classification with costly features.
- Neural networks have demonstrated considerable success in a wide variety of real-world problems. However, the presence of adversarial examples - slightly perturbed inputs that are misclassified with high confidence - limits our ability to guarantee performance for these networks in safety-critical applications. We demonstrate that, for networks that are piecewise affine (for example, deep networks with ReLU and maxpool units), proving no adversarial example exists - or finding the closest example if one does exist - can be naturally formulated as solving a mixed integer program. Solves for a fully-connected MNIST classifier with three hidden layers can be completed an order of magnitude faster than those of the best existing approach. To address the concern that adversarial examples are irrelevant because pixel-wise attacks are unlikely to happen in natural images, we search for adversaries over a natural class of perturbations written as convolutions with an adversarial blurring kernel. When searching over blurred images, we find that as opposed to pixelwise attacks, some misclassifications are impossible. Even more interestingly, a small fraction of input images are provably robust to blurs: every blurred version of the input is classified with the same, correct label.
- By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.
- Robots operate in environments with varying implicit structure. For instance, a helicopter flying over terrain encounters a very different arrangement of obstacles than a robotic arm manipulating objects on a cluttered table top. State-of-the-art motion planning systems do not exploit this structure, thereby expending valuable planning effort searching for implausible solutions. We are interested in planning algorithms that actively infer the underlying structure of the valid configuration space during planning in order to find solutions with minimal effort. Consider the problem of evaluating edges on a graph to quickly discover collision-free paths. Evaluating edges is expensive, both for robots with complex geometries like robot arms, and for robots with limited onboard computation like UAVs. Until now, this challenge has been addressed via laziness i.e. deferring edge evaluation until absolutely necessary, with the hope that edges turn out to be valid. However, all edges are not alike in value - some have a lot of potentially good paths flowing through them, and some others encode the likelihood of neighbouring edges being valid. This leads to our key insight - instead of passive laziness, we can actively choose edges that reduce the uncertainty about the validity of paths. We show that this is equivalent to the Bayesian active learning paradigm of decision region determination (DRD). However, the DRD problem is not only combinatorially hard, but also requires explicit enumeration of all possible worlds. We propose a novel framework that combines two DRD algorithms, DIRECT and BISECT, to overcome both issues. We show that our approach outperforms several state-of-the-art algorithms on a spectrum of planning problems for mobile robots, manipulators and autonomous helicopters.
- We analyse how the standard reductions between constraint satisfaction problems affect their proof complexity. We show that, for the most studied propositional, algebraic, and semi-algebraic proof systems, the classical constructions of pp-interpretability, homomorphic equivalence and addition of constants to a core preserve the proof complexity of the CSP. As a result, for those proof systems, the classes of constraint languages for which small unsatisfiability certificates exist can be characterised algebraically. We illustrate our results by a gap theorem saying that a constraint language either has resolution refutations of constant width, or does not have bounded-depth Frege refutations of subexponential size. The former holds exactly for the widely studied class of constraint languages of bounded width. This class is also known to coincide with the class of languages with refutations of sublinear degree in Sums-of-Squares and Polynomial Calculus over the real-field, for which we provide alternative proofs. We then ask for the existence of a natural proof system with good behaviour with respect to reductions and simultaneously small size refutations beyond bounded width. We give an example of such a proof system by showing that bounded-degree Lovász-Schrijver satisfies both requirements. Finally, building on the known lower bounds, we demonstrate the applicability of the method of reducibilities and construct new explicit hard instances of the graph 3-coloring problem for all studied proof systems.
- Nov 21 2017 cs.CV arXiv:1711.07319v1The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge.
- Nov 21 2017 cs.CV arXiv:1711.07312v1We develop a Computer Aided Diagnosis (CAD) system, which enhances the performance of dentists in detecting wide range of dental caries. The CAD System achieves this by acting as a second opinion for the dentists with way higher sensitivity on the task of detecting cavities than the dentists themselves. We develop annotated dataset of more than 3000 bitewing radiographs and utilize it for developing a system for automated diagnosis of dental caries. Our system consists of a deep fully convolutional neural network (FCNN) consisting 100+ layers, which is trained to mark caries on bitewing radiographs. We have compared the performance of our proposed system with three certified dentists for marking dental caries. We exceed the average performance of the dentists in both recall (sensitivity) and F1-Score (agreement with truth) by a very large margin. Working example of our system is shown in Figure 1.
- Nov 21 2017 cs.MM arXiv:1711.07306v1Deep learning based image steganalysis has attracted increasing attentions in recent years. Several Convolutional Neural Network (CNN) models have been proposed and achieved state-of-the-art performances on detecting steganography. In this paper, we explore an important technique in deep learning, the batch normalization, for the task of image steganalysis. Different from natural image classification, steganalysis is to discriminate cover images and stego images which are the result of adding weak stego signals into covers. This characteristic makes a cover image is more statistically similar to its stego than other cover images, requiring steganalytic methods to use paired learning to extract effective features for image steganalysis. Our theoretical analysis shows that a CNN model with multiple normalization layers is hard to be generalized to new data in the test set when it is well trained with paired learning. To hand this difficulty, we propose a novel normalization technique called Shared Normalization (SN) in this paper. Unlike the batch normalization layer utilizing the mini-batch mean and standard deviation to normalize each input batch, SN shares same statistics for all training and test batches. Based on the proposed SN layer, we further propose a novel neural network model for image steganalysis. Extensive experiments demonstrate that the proposed network with SN layers is stable and can detect the state of the art steganography with better performances than previous methods.
- Nov 21 2017 cs.CV arXiv:1711.07302v1Zero-shot learning (ZSL) aims to recognize objects from novel unseen classes without any training data. Recently, structure-transfer based methods are proposed to implement ZSL by transferring structural knowledge from the semantic embedding space to image feature space to classify testing images. However, we observe that such a knowledge transfer framework may suffer from the problem of the geometric inconsistency between the data in the training and testing spaces. We call this problem as the space shift problem. In this paper, we propose a novel graph based method to alleviate this space shift problem. Specifically, a Shared Reconstruction Graph (SRG) is pursued to capture the common structure of data in the two spaces. With the learned SRG, each unseen class prototype (cluster center) in the image feature space can be synthesized by the linear combination of other class prototypes, so that testing instances can be classified based on the distance to these synthesized prototypes. The SRG bridges the image feature space and semantic embedding space. By applying spectral clustering on the learned SRG, many meaningful clusters can be discovered, which interprets ZSL performance on the datasets. Our method can be easily extended to the generalized zero-shot learning setting. Experiments on three popular datasets show that our method outperforms other methods on all datasets. Even with a small number of training samples, our method can achieve the state-of-the-art performance.
- This paper describes autonomous racing of RC race cars based on mathematical optimization. Using a dynamical model of the vehicle, control inputs are computed by receding horizon based controllers, where the objective is to maximize progress on the track subject to the requirement of staying on the track and avoiding opponents. Two different control formulations are presented. The first controller employs a two-level structure, consisting of a path planner and a nonlinear model predictive controller (NMPC) for tracking. The second controller combines both tasks in one nonlinear optimization problem (NLP) following the ideas of contouring control. Linear time varying models obtained by linearization are used to build local approximations of the control NLPs in the form of convex quadratic programs (QPs) at each sampling time. The resulting QPs have a typical MPC structure and can be solved in the range of milliseconds by recent structure exploiting solvers, which is key to the real-time feasibility of the overall control scheme. Obstacle avoidance is incorporated by means of a high-level corridor planner based on dynamic programming, which generates convex constraints for the controllers according to the current position of opponents and the track layout. The control performance is investigated experimentally using 1:43 scale RC race cars, driven at speeds of more than 3 m/s and in operating regions with saturated rear tire forces (drifting). The algorithms run at 50 Hz sampling rate on embedded computing platforms, demonstrating the real-time feasibility and high performance of optimization-based approaches for autonomous racing.
- In this paper we compare two mathematical frameworks that make perturbative quantum field theory rigorous: the perturbative algebraic quantum field theory (pAQFT) and the factorization algebras framework developed by Costello and Gwilliam. To make the comparison as explicit as possible, we focus here on the example of the free scalar field. The main claim is that both approaches are equivalent if one assumes the time-slice axiom. The key technical ingredient is to use time-ordered products as an intermediate step between a net of associative algebras and a factorization algebra.
- Nov 20 2017 cs.CV arXiv:1711.06666v1In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better understand the meaning of an ad. We further show how anchoring ad understanding in general-purpose object recognition and image captioning can further improve results. We formulate the ad understanding task as matching the ad image to human-generated statements that describe the action that the ad prompts and the rationale it provides for this action. We greatly outperform the state of the art in this task. We also show additional applications of our learned representations for ranking the slogans of ads, and clustering ads according to their topic.