Top arXiv papers

sign in to customize
  • PDF
    We study quantum channels that are close to another channel with weakly additive Holevo information and derive upper bounds on their classical capacity. Examples of channels with weakly additive Holevo information are entanglement-breaking channels, unital qubit channels, and Hadamard channels. Related to the method of approximate degradability, we define approximation parameters for each class above that measure how close an arbitrary channel is to satisfying the respective property. This gives us upper bounds on the classical capacity in terms of functions of the approximation parameters. Since these parameters are defined in terms of the diamond distance, the upper bounds can be computed efficiently using semidefinite programming (SDP). We exhibit the usefulness of our method with two example channels: a convex mixture of amplitude damping and depolarizing noise, and a composition of amplitude damping and dephasing noise. For both channels, our bounds perform well in certain regimes of the noise parameters in comparison to a recently derived SDP upper bound on the classical capacity. Along the way, we define the notion of a generalized channel divergence (which includes the diamond distance as an example), and we prove that for jointly covariant channels these quantities are maximized by purifications of a state invariant under the covariance group. This latter result may be of independent interest.
  • PDF
    By generalizing the Cabello-Severini-Winter (CSW) framework, we build a bridge from this graph-theoretic approach for Kochen-Specker (KS) contextuality to a hypergraph-theoretic approach for Spekkens' contextuality, as applied to Kochen-Specker type scenarios. Our generalized framework describes an experiment that requires, besides the correlations between measurements carried out on a system prepared according to a fixed preparation procedure (as in Bell-KS type experiments), the correlations between measurement outcomes and corresponding preparations that seek to make these measurement outcomes highly predictable. This latter feature of the experiment allows us to obtain noise-robust noncontextuality inequalities by applying the assumption of noncontextuality to both preparations and measurements, without requiring the assumption of outcome-determinism. Indeed, we treat all measurements on an equal footing: no notion of "sharpness" is presumed for them, hence no putative justification of outcome-determinism from sharpness is sought. As a result, unlike the CSW framework, we do not require Specker's principle (or the exclusivity principle) -- that pairwise exclusive measurement events must all be mutually exclusive -- as a fundamental constraint on (sharp) measurement events in any operational theory describing the experiment. All this allows us, for the case of quantum theory, to deal with nonprojective (or unsharp) measurements and resolve the pathologies they lead to in traditional approaches. Our noncontextuality inequalities are robust to the presence of noise in the experimental procedures, whether they are measurements or preparations, and are applicable to operational theories that need not be quantum.
  • PDF
    This is a note accompanying "CS 410/510: INTRO TO QUANTUM COMPUTING" I taught at Portland State University in Spring 2017. It is a review and summary of some early results related to Grover's quantum search algorithm in a consistent way. I had to go back and forth among several books, notes and original papers to sort out various details when preparing the lectures, which was a pain. This is the motivation behind this note. I would like to thank Peter Høyer for valuable feedback on this note.
  • PDF
    We report a proof-of-principle experimental demonstration of a quantum speed-up for learning agents utilizing a small-scale quantum information processor based on radiofrequency-driven trapped ions. The decision-making process of a quantum learning agent within the projective simulation paradigm for machine learning is modeled in a system of two qubits. The latter are realized in the hyperfine levels of two frequency-addressed ions exposed to a static magnetic field gradient. The deliberation algorithm is implemented using single-qubit rotations and two-qubit conditional quantum dynamics. We show that the deliberation time of this quantum learning agent is quadratically improved with respect to comparable classical learning agents. The performance of this quantum-enhanced learning agent highlights the potential of scalable ion trap quantum processors taking advantage of machine learning.
  • PDF
    We present a technique to coarse-grain quantum states in a finite-dimensional Hilbert space. Our method is distinguished from other approaches by not relying on structures such as a preferred factorization of Hilbert space or a preferred set of operators (local or otherwise) in an associated algebra. Rather, we use the data corresponding to a given set of states, either specified independently or constructed from a single state evolving in time. Our technique is based on principle component analysis (PCA), and the resulting coarse-grained quantum states live in a lower dimensional Hilbert space whose basis is defined using the underlying (isometric embedding) transformation of the set of fine-grained states we wish to coarse-grain. Physically, the transformation can be interpreted to be an "entanglement coarse-graining" scheme that retains most of the global, useful entanglement structure of each state, while needing fewer degrees of freedom for its reconstruction. This scheme could be useful for efficiently describing collections of states whose number is much smaller than the dimension of Hilbert space, or a single state evolving over time.
  • PDF
    Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom view a sheepdog as more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes.
  • PDF
    Motivated by the close relations of the renormalization group with both the holography duality and the deep learning, we propose that the holographic geometry can emerge from deep learning the entanglement feature of a quantum many-body state. We develop a concrete algorithm, call the entanglement feature learning (EFL), based on the random tensor network (RTN) model for the tensor network holography. We show that each RTN can be mapped to a Boltzmann machine, trained by the entanglement entropies over all subregions of a given quantum many-body state. The goal is to construct the optimal RTN that best reproduce the entanglement feature. The RTN geometry can then be interpreted as the emergent holographic geometry. We demonstrate the EFL algorithm on 1D free fermion system and observe the emergence of the hyperbolic geometry (AdS$_3$ spatial geometry) as we tune the fermion system towards the gapless critical point (CFT$_2$ point).
  • PDF
    Bound states of massive particles, such as nuclei, atoms or molecules, are ubiquitous in nature and constitute the bulk of the visible world around us. In contrast, photons typically only weakly influence each other due to their very weak interactions and vanishing mass. We report the observation of traveling three-photon bound states in a quantum nonlinear medium where the interactions between photons are mediated by atomic Rydberg states. In particular, photon correlation and conditional phase measurements reveal the distinct features associated with three-photon and two-photon bound states. Such photonic trimers and dimers can be viewed as quantum solitons with shape-preserving wavefunctions that depend on the constituent photon number. The observed bunching and strongly nonlinear optical phase are quantitatively described by an effective field theory (EFT) of Rydberg-induced photon-photon interactions, which demonstrates the presence of a substantial effective three-body force between the photons. These observations pave the way towards the realization, studies, and control of strongly interacting quantum many-body states of light.
  • PDF
    Bayesian data analysis is not only about computing a posterior distribution, and Bayesian visualization is about more than trace plots of Markov chains. Rather, practical Bayesian data analysis is an iterative process of model building, inference, model checking and evaluation, and model expansion. Visualization is not only helpful in each of these stages of the Bayesian workflow, it is indispensable for making inferences from the intricate, high-dimensional models of interest to applied researchers, and essential for understanding and diagnosing the increasingly complex algorithms required to fit such models.
  • PDF
    A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance.
  • PDF
    When applied to training deep neural networks, stochastic gradient descent (SGD) often incurs steady progression phases, interrupted by catastrophic episodes in which loss and gradient norm explode. A possible mitigation of such events is to slow down the learning process. This paper presents a novel approach to control the SGD learning rate, that uses two statistical tests. The first one, aimed at fast learning, compares the momentum of the normalized gradient vectors to that of random unit vectors and accordingly gracefully increases or decreases the learning rate. The second one is a change point detection test, aimed at the detection of catastrophic learning episodes; upon its triggering the learning rate is instantly halved. Both abilities of speeding up and slowing down the learning rate allows the proposed approach, called SALeRA, to learn as fast as possible but not faster. Experiments on standard benchmarks show that SALeRA performs well in practice, and compares favorably to the state of the art.
  • PDF
    Can a neural network learn the concept of visual similarity? In this work, this question is addressed by training a deep learning model for the specific task of measuring the similarity between a pair of pictures in content-based image retrieval datasets. Traditionally, content-based image retrieval systems rely on two fundamental tasks: 1) computing meaningful image representations from pixels and 2) measuring accurate visual similarity between those representations. Whereas in the last few years several methods have been proposed to find high quality image representations including SIFT, VLAD or RMAC, most techniques still depend on standard metrics such as Euclidean distance or cosine similarity for the visual similarity task. However, standard metrics are independent from data and might be missing the nonlinear inner structure of visual representations. In this paper, we propose to learn a non-metric visual similarity function directly from image representations to measure how alike two images are. Experiments on standard image retrieval datasets show that results are boosted when using the proposed method over standard metrics.
  • PDF
    This paper proves that push-pull block puzzles in 3D are PSPACE-complete to solve, and push-pull block puzzles in 2D with thin walls are NP-hard to solve, settling an open question by Zubaran and Ritt. Push-pull block puzzles are a type of recreational motion planning problem, similar to Sokoban, that involve moving a `robot' on a square grid with $1 \times 1$ obstacles. The obstacles cannot be traversed by the robot, but some can be pushed and pulled by the robot into adjacent squares. Thin walls prevent movement between two adjacent squares. This work follows in a long line of algorithms and complexity work on similar problems. The 2D push-pull block puzzle shows up in the video games Pukoban as well as The Legend of Zelda: A Link to the Past, giving another proof of hardness for the latter. This variant of block-pushing puzzles is of particular interest because of its connections to reversibility, since any action (e.g., push or pull) can be inverted by another valid action (e.g., pull or push).
  • PDF
    The workshop on Parton Distributions and Lattice Calculations in the LHC era (PDFLattice2017) was hosted at Balliol College, Oxford (UK), from 22$^{\rm nd}$ to 24$^{\rm th}$ March 2017. The workshop brought together the lattice-QCD and the global-fit physicists who devote their efforts to determine the parton distribution functions (PDFs) of the proton. The goals were to make the two communities more familiar between each other, review developments from both sides, and set precision targets for lattice calculations so that they can contribute, together with the forthcoming experimental input, to the next generation of PDF determinations. This contribution summarises the relevant outcome of the workshop, in anticipation of a thorough white paper.
  • PDF
    In this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function.
  • PDF
    Convolutional neural networks are built upon the convolution operation, which extracts informative features by fusing spatial and channel-wise information together within local receptive fields. In order to boost the representational power of a network, much existing work has shown the benefits of enhancing spatial encoding. In this work, we focus on channels and propose a novel architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We demonstrate that by stacking these blocks together, we can construct SENet architectures that generalise extremely well across challenging datasets. Crucially, we find that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at slight computational cost. SENets formed the foundation of our ILSVRC 2017 classification submission which won first place and significantly reduced the top-5 error to 2.251%, achieving a 25% relative improvement over the winning entry of 2016.
  • PDF
    How does a person work out their location using a floorplan? It is probably safe to say that we do not explicitly measure depths to every visible surface and try to match them against different pose estimates in the floorplan. And yet, this is exactly how most robotic scan-matching algorithms operate. Similarly, we do not extrude the 2D geometry present in the floorplan into 3D and try to align it to the real-world. And yet, this is how most vision-based approaches localise. Humans do the exact opposite. Instead of depth, we use high level semantic cues. Instead of extruding the floorplan up into the third dimension, we collapse the 3D world into a 2D representation. Evidence of this is that many of the floorplans we use in everyday life are not accurate, opting instead for high levels of discriminative landmarks. In this work, we use this insight to present a global localisation approach that relies solely on the semantic labels present in the floorplan and extracted from RGB images. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.
  • PDF
    We introduce an online active exploration algorithm for data-efficiently learning an abstract symbolic model of an environment. Our algorithm is divided into two parts: the first part quickly generates an intermediate Bayesian symbolic model from the data that the agent has collected so far, which the agent can then use along with the second part to guide its future exploration towards regions of the state space that the model is uncertain about. We show that our algorithm outperforms random and greedy exploration policies on two different computer game domains. The first domain is an Asteroids-inspired game with complex dynamics, but basic logical structure. The second is the Treasure Game, with simpler dynamics, but more complex logical structure.
  • PDF
    Many undergraduate students of engineering and the exact sciences have difficulty with their mathematics courses due to insufficient proficiency in what we in this paper have termed clear thinking. We believe that this lack of proficiency is one of the primary causes underlying the common difficulties students face, leading to mistakes like the improper use of definitions and the improper phrasing of definitions, claimes and proofs. We further argue that clear thinking is not a skill that is acquired easily and naturally - it must be consciously learned and developed. The paper describes, using concrete examples, how the examination and analysis of classical paradoxes can be a fine tool for developing students' clear thinking. It also looks closely at the paradoxes themselves, and at the various solutions that have been proposed for them. We believe that the extensive literature on paradoxes has not always given clear thinking its due emphasis as an analytical tool. We therefore suggest that other discipkunes could also benefit from drawing upon the strategies employed by mathematicians to describe and examine the foundations of the problems they encounter.
  • PDF
    Fine-tuning of a deep convolutional neural network (CNN) is often desired. This paper provides an overview of our publicly available py-faster-rcnn-ft software library that can be used to fine-tune the VGG_CNN_M_1024 model on custom subsets of the Microsoft Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image files in the dataset by hand which can then be used in the demo program. Our implementation randomly selects images that contain at least one object of the categories on which the model is fine-tuned.
  • PDF
    The number of leaves a plant has is one of the key traits (phenotypes) describing its development and growth. Here, we propose an automated, deep learning based approach for counting leaves in model rosette plants. While state-of-the-art results on leaf counting with deep learning methods have recently been reported, they obtain the count as a result of leaf segmentation and thus require per-leaf (instance) segmentation to train the models (a rather strong annotation). Instead, our method treats leaf counting as a direct regression problem and thus only requires as annotation the total leaf count per plant. We argue that combining different datasets when training a deep neural network is beneficial and improves the results of the proposed approach. We evaluate our method on the CVPPP 2017 Leaf Counting Challenge dataset, which contains images of Arabidopsis and tobacco plants. Experimental results show that the proposed method significantly outperforms the winner of the previous CVPPP challenge, improving the results by a minimum of ~50% on each of the test datasets, and can achieve this performance without knowing the experimental origin of the data (i.e. in the wild setting of the challenge). We also compare the counting accuracy of our model with that of per leaf segmentation algorithms, achieving a 20% decrease in mean absolute difference in count (|DiC|).
  • PDF
    Many efforts have been made to use various forms of domain knowledge in malware detection. Currently there exist two common approaches to malware detection without domain knowledge, namely byte n-grams and strings. In this work we explore the feasibility of applying neural networks to malware detection and feature learning. We do this by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header. By doing this we show that neural networks can learn from raw bytes without explicit feature construction, and perform even better than a domain knowledge approach that parses the PE header into explicit features.
  • PDF
    To determine the 3D orientation and 3D location of objects in the surroundings of a camera mounted on a robot or mobile device, we developed two powerful algorithms in object detection and temporal tracking that are combined seamlessly for robotic perception and interaction as well as Augmented Reality (AR). A separate evaluation of, respectively, the object detection and the temporal tracker demonstrates the important stride in research as well as the impact on industrial robotic applications and AR. When evaluated on a standard dataset, the detector produced the highest f1-score with a large margin while the tracker generated the best accuracy at a very low latency of approximately 2 ms per frame with one CPU core: both algorithms outperforming the state of the art. When combined, we achieve a powerful framework that is robust to handle multiple instances of the same object under occlusion and clutter while attaining real-time performance. Aiming at stepping beyond the simple scenarios used by current systems, often constrained by having a single object in absence of clutter, averting to touch the object to prevent close-range partial occlusion, selecting brightly colored objects to easily segment them individually or assuming that the object has simple geometric structure, we demonstrate the capacity to handle challenging cases under clutter, partial occlusion and varying lighting conditions with objects of different shapes and sizes.
  • PDF
    Face alignment is a classic problem in the computer vision field. Previous works mostly focus on sparse alignment with a limited number of facial landmark points, i.e., facial landmark detection. In this paper, for the first time, we aim at providing a very dense 3D alignment for large-pose face images. To achieve this, we train a CNN to estimate the 3D face shape, which not only aligns limited facial landmarks but also fits face contours and SIFT feature points. Moreover, we also address the bottleneck of training CNN with multiple datasets, due to different landmark markups on different datasets, such as 5, 34, 68. Experimental results show our method not only provides high-quality, dense 3D face fitting but also outperforms the state-of-the-art facial landmark detection methods on the challenging datasets. Our model can run at real time during testing.
  • PDF
    Probabilistic mixture models have been widely used for different machine learning and pattern recognition tasks such as clustering, dimensionality reduction, and classification. In this paper, we focus on trying to solve the most common challenges related to supervised learning algorithms by using mixture probability distribution functions. With this modeling strategy, we identify sub-labels and generate synthetic data in order to reach better classification accuracy. It means we focus on increasing the training data synthetically to increase the classification accuracy.
  • PDF
    Following the increasingly popular trend of social interaction analysis in egocentric vision, this manuscript proposes a new pipeline for automatic social pattern characterization of a wearable photo-camera user, relying on visual analysis of captured egocentric photos. The proposed framework consists of three major steps. The first step is dedicated to social interaction detection where the impact of several social signals is explored. Detected social events are inspected in the second step for categorization into different social meetings. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task, and LSTM is employed for time-series classification. The last step of the framework corresponds to the social pattern characterization of the user, where recurrences of the same person across the whole set of social events of the user are clustered to achieve a comprehensive understanding of the diversity and frequency of the social relations of the user. Experimental evaluation over a dataset acquired by a user wearing a photo-camera during a month demonstrates the relevance of the considered features for social interaction analysis, and show promising results on the task of social pattern characterization from egocentric photo-streams.
  • PDF
    Automatic analysis of the video is one of most complex problems in the fields of computer vision and machine learning. A significant part of this research deals with (human) activity recognition (HAR) since humans, and the activities that they perform, generate most of the video semantics. Video-based HAR has applications in various domains, but one of the most important and challenging is HAR in sports videos. Some of the major issues include high inter- and intra-class variations, large class imbalance, the presence of both group actions and single player actions, and recognizing simultaneous actions, i.e., the multi-label learning problem. Keeping in mind these challenges and the recent success of CNNs in solving various computer vision problems, in this work, we implement a 3D CNN based multi-label deep HAR system for multi-label class-imbalanced action recognition in hockey videos. We test our system for two different scenarios: an ensemble of k binary networks vs. a single k-output network, on a publicly available dataset. We also compare our results with the system that was originally designed for the chosen dataset. Experimental results show that the proposed approach performs better than the existing solution.
  • PDF
    We describe some necessary conditions for the existence of a Hamiltonian path in any graph (in other words, for a graph to be traceable). These conditions result in a linear time algorithm to decide the Hamiltonian path problem for cactus graphs. We apply this algorithm to several molecular databases to report the numbers of graphs that are traceable cactus graphs.
  • PDF
    This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute \emphWord2VisualVec, a deep neural network architecture that learns to predict a visual feature representation from textual input. Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. We further generalize Word2VisualVec for video caption retrieval, by predicting from text both 3-D convolutional neural network features as well as a visual-audio representation. Experiments on Flickr8k, Flickr30k, the Microsoft Video Description dataset and the very recent NIST TrecVid challenge for video caption retrieval detail Word2VisualVec's properties, its benefit over textual embeddings, the potential for multimodal query composition and its state-of-the-art results.
  • PDF
    Scattering Transforms (or ScatterNets) introduced by Mallat are a promising start into creating a well-defined feature extractor to use for pattern recognition and image classification tasks. They are of particular interest due to their architectural similarity to Convolutional Neural Networks (CNNs), while requiring no parameter learning and still performing very well (particularly in constrained classification tasks). In this paper we visualize what the deeper layers of a ScatterNet are sensitive to using a 'DeScatterNet'. We show that the higher orders of ScatterNets are sensitive to complex, edge-like patterns (checker-boards and rippled edges). These complex patterns may be useful for texture classification, but are quite dissimilar from the patterns visualized in second and third layers of Convolutional Neural Networks (CNNs) - the current state of the art Image Classifiers. We propose that this may be the source of the current gaps in performance between ScatterNets and CNNs (83% vs 93% on CIFAR-10 for ScatterNet+SVM vs ResNet). We then use these visualization tools to propose possible enhancements to the ScatterNet design, which show they have the power to extract features more closely resembling CNNs, while still being well-defined and having the invariance properties fundamental to ScatterNets.
  • PDF
    Recently variational models with priors involving first and second order derivatives resp. differences were successfully applied for image restoration. There are several ways to incorporate the derivatives of first and second order into the prior, for example additive coupling or using infimal convolution (IC), as well as the more general model of total generalized variation (TGV). The later two methods give also decompositions of the restored images into image components with distinct "smoothness" properties which are useful in applications. This paper is the first attempt to generalize these models to manifold-valued images. We propose both extrinsic and intrinsic approaches. The extrinsic approach is based on embedding the manifold into an Euclidean space of higher dimension. Models following this approach can be formulated within the Euclidean space with a constraint restricting them to the manifold. Then alternating direction methods of multipliers can be employed for finding minima. However, the components within the infimal convolution or total generalized variation decomposition live in the embedding space rather than on the manifold which makes their interpretation difficult. Therefore we also investigate two intrinsic approaches. For manifolds which are Lie groups we propose three priors which exploit the group operation, an additive one, another with IC coupling and a third TGV like one. For computing the minimizers of the intrinsic models we apply gradient descent algorithms. For general Riemannian manifolds we further define a model for infimal convolution based on the recently developed second order differences. We demonstrate by numerical examples that our approaches works well for the circle, the 2-sphere, the rotation group, and the manifold of positive definite matrices with the affine invariant metric.
  • PDF
    In this work we define a 2-dimensional analogue of extranatural transformation and use these to characterise codescent objects. They will be seen as universal objects amongst extrapseudonatural transformations in a similar manner in which coends are universal objects amongst extranatural transformations. Some composition lemmas concerning these transformations are introduced and a Fubini theorem for codescent objects is proven using the universal characterisation description.
  • PDF
    This paper proposes a novel deep reinforcement learning (RL) method integrating the neural-network-based RL and the classical RL based on dynamic programming. In comparison to the conventional deep RL methods, our method enhances the convergence speed and the performance by delving into the following two characteristic features in the training of conventional RL: (1) Having many credible experiences is important in training RL algorithms, (2) Input states can be semantically clustered into a relatively small number of core clusters, and the states belonging to the same cluster tend to share similar Q-values given an action. By following the two observations, we propose a dictionary-type memory that accumulates the Q-value for each cluster of states as well as the corresponding action, in terms of priority. Then, we iteratively update each Q-value in the memory from the Q-value acquired from the network trained by the experiences stored in the memory. We demonstrate the effectiveness of our method through training RL algorithms on widely used game environments from OpenAI.
  • PDF
    In order to retrieve unlabeled images by textual queries, cross-media similarity computation is a key ingredient. Although novel methods are continuously introduced, little has been done to evaluate these methods together with large-scale query log analysis. Consequently, how far have these methods brought us in answering real-user queries is unclear. Given baseline methods that compute cross-media similarity using relatively simple text/image matching, how much progress have advanced models made is also unclear. This paper takes a pragmatic approach to answering the two questions. Queries are automatically categorized according to the proposed query visualness measure, and later connected to the evaluation of multiple cross-media similarity models on three test sets. Such a connection reveals that the success of the state-of-the-art is mainly attributed to their good performance on visual-oriented queries, while these queries account for only a small part of real-user queries. To quantify the current progress, we propose a simple text2image method, representing a novel test query by a set of images selected from large-scale query log. Consequently, computing cross-media similarity between the test query and a given image boils down to comparing the visual similarity between the given image and the selected images. Image retrieval experiments on the challenging Clickture dataset show that the proposed text2image compares favorably to recent deep learning based alternatives.
  • PDF
    Hamiltonian Flow Monte Carlo(HFMC) methods have been implemented in engineering, biology and chemistry. HFMC makes large gradient based steps to rapidly explore the state space. The application of the Hamiltonian dynamics allows to estimate rare events and sample from target distributions defined as the change of measures. The estimates demonstrated a variance reduction of the presented algorithm and its efficiency with respect to a standard Monte Carlo and interacting particle based system(IPS). We tested the algorithm on the case of the barrier option pricing.
  • PDF
    The ability to semantically interpret hand-drawn line sketches, although very challenging, can pave way for novel applications in multimedia. We propose SketchParse, the first deep-network architecture for fully automatic parsing of freehand object sketches. SketchParse is configured as a two-level fully convolutional network. The first level contains shared layers common to all object categories. The second level contains a number of expert sub-networks. Each expert specializes in parsing sketches from object categories which contain structurally similar parts. Effectively, the two-level configuration enables our architecture to scale up efficiently as additional categories are added. We introduce a router layer which (i) relays sketch features from shared layers to the correct expert (ii) eliminates the need to manually specify object category during inference. To bypass laborious part-level annotation, we sketchify photos from semantic object-part image datasets and use them for training. Our architecture also incorporates object pose prediction as a novel auxiliary task which boosts overall performance while providing supplementary information regarding the sketch. We demonstrate SketchParse's abilities (i) on two challenging large-scale sketch datasets (ii) in parsing unseen, semantically related object categories (iii) in improving fine-grained sketch-based image retrieval. As a novel application, we also outline how SketchParse's output can be used to generate caption-style descriptions for hand-drawn sketches.
  • PDF
    Authoring location-based experiences involving multiple participants, collaborating or competing in both indoor and outdoor mixed realities, is extremely complex and bound to serious technical challenges. In this work, we present the first results of the MAGELLAN European project and how these greatly simplify this creative process using novel authoring, augmented reality (AR) and indoor geolocalisation techniques.
  • PDF
    We present the science case and observations plan of the MeerKAT Fornax Survey, an HI and radio continuum survey of the Fornax galaxy cluster to be carried out with the SKA precursor MeerKAT. Fornax is the second most massive cluster within 20 Mpc and the largest nearby cluster in the southern hemisphere. Its low X-ray luminosity makes it representative of the environment where most galaxies live and where substantial galaxy evolution takes place. Fornax's ongoing growth makes it an excellent laboratory for studying the assembly of clusters, the physics of gas accretion and stripping in galaxies falling in the cluster, and the connection between these processes and the neutral medium in the cosmic web. We will observe a region of 12 deg$^2$ reaching a projected distance of 1.5 Mpc from the cluster centre. This will cover a wide range of environment density out to the outskirts of the cluster, where gas-rich in-falling groups are found. We will: study the HI morphology of resolved galaxies down to a column density of a few times 1e+19 cm$^{-2}$ at a resolution of 1 kpc; measure the slope of the HI mass function down to M(HI) 5e+5 M(sun); and attempt to detect HI in the cosmic web reaching a column density of 1e+18 cm$^{-2}$ at a resolution of 10 kpc.
  • PDF
    Absolute positioning of vehicles is based on Global Navigation Satellite Systems (GNSS) combined with on-board sensors and high-resolution maps. In Cooperative Intelligent Transportation Systems (C-ITS), the positioning performance can be augmented by means of vehicular networks that enable vehicles to share location-related information. This paper presents an Implicit Cooperative Positioning (ICP) algorithm that exploits the Vehicle-to-Vehicle (V2V) connectivity in an innovative manner, avoiding the use of explicit V2V measurements such as ranging. In the ICP approach, vehicles jointly localize non-cooperative physical features (such as people, traffic lights or inactive cars) in the surrounding areas, and use them as common noisy reference points to refine their location estimates. Information on sensed features are fused through V2V links by a consensus procedure, nested within a message passing algorithm, to enhance the vehicle localization accuracy. As positioning does not rely on explicit ranging information between vehicles, the proposed ICP method is amenable to implementation with off-the-shelf vehicular communication hardware. The localization algorithm is validated in different traffic scenarios, including a crossroad area with heterogeneous conditions in terms of feature density and V2V connectivity, as well as a real urban area by using Simulation of Urban MObility (SUMO) for traffic data generation. Performance results show that the proposed ICP method can significantly improve the vehicle location accuracy compared to the stand-alone GNSS, especially in harsh environments, such as in urban canyons, where the GNSS signal is highly degraded or denied.
  • PDF
    Loopedia is a new database at loopedia.org for information on Feynman integrals, intended to provide both bibliographic information as well as results made available by the community. Its bibliometry is complementary to that of SPIRES or arXiv in the sense that it admits searching for integrals by graph-theoretical objects, e.g. its topology.
  • PDF
    With the recent development of high-end LiDARs, more and more systems are able to continuously map the environment while moving and producing spatially redundant information. However, none of the previous approaches were able to effectively exploit this redundancy in a dense LiDAR mapping problem. In this paper, we present a new approach for dense LiDAR mapping using probabilistic surfel fusion. The proposed system is capable of reconstructing a high-quality dense surface element (surfel) map from spatially redundant multiple views. This is achieved by a proposed probabilistic surfel fusion along with a geometry considered data association. The proposed surfel data association method considers surface resolution as well as high measurement uncertainty along its beam direction which enables the mapping system to be able to control surface resolution without introducing spatial digitization. The proposed fusion method successfully suppresses the map noise level by considering measurement noise caused by laser beam incident angle and depth distance in a Bayesian filtering framework. Experimental results with simulated and real data for the dense surfel mapping prove the ability of the proposed method to accurately find the canonical form of the environment without further post-processing.
  • PDF
    The IVOA VOEvent Recommendation defines a means of describing transient celestial events but, purposely, remains silent on the topic of how those descriptions should be transmitted. This document formalizes a TCP-based protocol for VOEvent transportation that has been in use by members of the VOEvent community for several years and discusses the topology of the event distribution network. It is intended to act as a reference for the production of compliant protocol implementations.
  • PDF
    Example-based mesh deformation methods are powerful tools for realistic shape editing. However, existing techniques typically combine all the example deformation modes, which can lead to overfitting, i.e. using a overly complicated model to explain the user-specified deformation. This leads to implausible or unstable deformation results, including unexpected global changes outside the region of interest. To address this fundamental limitation, we propose a sparse blending method that automatically selects a smaller number of deformation modes to compactly describe the desired deformation. This along with a suitably chosen deformation basis including spatially localized deformation modes leads to significant advantages, including more meaningful, reliable, and efficient deformations because fewer and localized deformation modes are applied. To cope with large rotations, we develop a simple but effective representation based on polar decomposition of deformation gradients, which resolves the ambiguity of large global rotations using an as-consistent-as-possible global optimization. This simple representation has a closed form solution for derivatives, making it efficient for sparse localized representation and thus ensuring interactive performance. Experimental results show that our method outperforms state-of-the-art data-driven mesh deformation methods, for both quality of results and efficiency.
  • PDF
    We give an improved algorithm for counting the number of $1324$-avoiding permutations, resulting in $14$ further terms of the generating function, which is now known for all patterns of length $\le 50$. We re-analyse the generating function and find additional evidence for our earlier conclusion that unlike other classical length-$4$ pattern-avoiding permutations, the generating function does not have a simple power-law singularity, but rather, the number of $1324$-avoiding permutations of length $n$ behaves as \[B⋅\mu^n ⋅\mu_1^\sqrtn ⋅n^g. \]We estimate $\mu=11.600 \pm 0.003$, $\mu_1 = 0.0400 \pm 0.0005$, $g = -1.1 \pm 0.1$ while the estimate of $B$ depends sensitively on the precise value of $\mu$, $\mu_1$ and $g$. This reanalysis provides substantially more compelling arguments for the presence of the stretched exponential term $\mu_1^{\sqrt{n}}$.
  • PDF
    Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of pairwise similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such pairwise similarity serves as the discriminative similarity for the purpose of clustering and semi-supervised learning, and discriminative similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods.
  • PDF
    We study the proximal gradient descent (PGD) method for $\ell^{0}$ sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the $\ell^{0}$ sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the $\ell^{0}$ sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem.
  • PDF
    This paper introduces a skew variant of the notion of enriched category, suitable for enrichment over a skew-monoidal category, the main novelty of which is that the elements of the enriched hom-objects need not be in bijection with the morphisms of the underlying category. This is the natural setting in which to introduce the notion of locally weak comonad, which is fundamental to the theory of enriched algebraic weak factorisation systems. The equivalence, for a monoidal closed category $\mathcal{V}$, between tensored $\mathcal{V}$-categories and hommed $\mathcal{V}$-actegories is extended to the skew setting and easily proved by recognising both skew $\mathcal{V}$-categories and skew $\mathcal{V}$-actegories as equivalent to special kinds of skew $\mathcal{V}$-proactegory.
  • PDF
    Edge bundling methods can effectively alleviate visual clutter and reveal high-level graph structures in large graph visualization. Researchers have devoted significant efforts to improve edge bundling according to different metrics. As the edge bundling family evolve rapidly, the quality of edge bundles receives increasing attention in the literature accordingly. In this paper, we present MLSEB, a novel method to generate edge bundles based on moving least squares (MLS) approximation. In comparison with previous edge bundling methods, we argue that our MLSEB approach can generate better results based on a quantitative metric of quality, and also ensure scalability and the efficiency for visualizing large graphs.
  • PDF
    Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address the first issue, we propose a multi-modal multi-scale deep learning model for extracting descriptive features from multifarious images. Specifically, the visual features extracted by a multi-scale deep learning subnetwork are refined with the textual features extracted from social tags along with images by a simple multi-layer perception subnetwork. Since we have extracted very powerful features by multi-modal multi-scale deep learning, we simplify the second issue and decompose large-scale image annotation into multi-class classification and label quantity prediction. Note that the label quantity prediction subproblem can be implicitly solved when a recurrent neural network (RNN) model is used for image annotation. However, in this paper, we choose to explicitly solve this subproblem directly using our deep learning model, resulting in that we can pay more attention to deep feature learning. Experimental results demonstrate the superior performance of our model as compared to the state-of-the-art (including RNN-based models).
  • PDF
    We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.

Recent comments

Aram Harrow Sep 06 2017 07:54 UTC

The paper only applies to conformal field theories, and such a result cannot hold for more general 1-D systems by 0705.4077 and other papers (assuming standard complexity theory conjectures).

Felix Leditzky Sep 05 2017 21:27 UTC

Thanks for the clarification, Philippe!

Philippe Faist Sep 05 2017 21:09 UTC

Hi Felix, thanks for the good question.

We've found it more convenient to consider trace-nonincreasing and $\Gamma$-sub-preserving maps (and this is justified by the fact that they can be dilated to fully trace-preserving and $\Gamma$-preserving maps on a larger system). The issue arises because

...(continued)
Felix Leditzky Sep 05 2017 19:02 UTC

What is the reason/motivation to consider trace-non-increasing maps instead of trace-preserving maps in your framework and the definition of the coherent relative entropy?

Steve Flammia Aug 30 2017 22:30 UTC

Thanks for the reference Ashley. If I understand your paper, you are still measuring stabilizers of X- and Z-type at the top layer of the code. So it might be that we can improve on the factor of 2 that you found if we tailor the stabilizers to the noise bias at the base level.

Ashley Aug 30 2017 22:09 UTC

We followed Aliferis and Preskill's approach in [https://arxiv.org/abs/1308.4776][1] and found that the fault-tolerant threshold for the surface code was increased by approximately a factor of two, from around 0.75 per cent to 1.5 per cent for a bias of 10 to 100.

[1]: https://arxiv.org/abs/1308.

...(continued)
Stephen Bartlett Aug 30 2017 21:55 UTC

Following on from Steve's comments, it's possible to use the bias-preserving gate set in Aliferis and Preskill directly to do the syndrome extraction, as you build up a CNOT gadget, but such a direct application of your methods would be very complicated and involve a lot of gate teleportation. If y

...(continued)
Steve Flammia Aug 30 2017 21:38 UTC

We agree that finding good syndrome extraction circuits if an important question. At the moment we do not have such circuits, though we have started to think about them. We are optimistic that this can be done in principle, but it remains to be seen if the circuits can be made sufficiently simple to

...(continued)
John Preskill Aug 30 2017 14:48 UTC

Hi Steves and David. When we wrote https://arxiv.org/abs/0710.1301 our viewpoint was that a gate with highly biased (primarily Z) noise would need to commute with Z. So we built our fault-tolerant gadgets from such gates, along with preparations and measurements in the X basis.

Can you easily ext

...(continued)
Steve Flammia Aug 30 2017 07:29 UTC

We haven't tried the Wen model yet. We thought about doing it, but decided to try this first. When it worked as well as it did we just didn't bother trying the Wen model, but it's a natural question, and I am curious about the answer.