# Top arXiv papers

• Gate model quantum computers with too many qubits to be simulated by available classical computers are about to arrive. We present a strategy for programming these devices without error correction or compilation. This means that the number of logical qubits is the same as the number of qubits on the device. The hardware determines which pairs of qubits can be addressed by unitary operators. The goal is to build quantum states that solve computational problems such as maximizing a combinatorial objective function or minimizing a Hamiltonian. These problems may not fit naturally on the physical layout of the qubits. Our algorithms use a sequence of parameterized unitaries that sit on the qubit layout to produce quantum states depending on those parameters. Measurements of the objective function (or Hamiltonian) guide the choice of new parameters with the goal of moving the objective function up (or lowering the energy). As an example we consider finding approximate solutions to MaxCut on 3-regular graphs whereas the hardware is physical qubits laid out on a rectangular grid. We prove that the lowest depth version of the Quantum Approximate Optimization Algorithm will achieve an approximation ratio of at least 0.5293 on all large enough instances which beats random guessing (0.5). We open up the algorithm to have different parameters for each single qubit $X$ rotation and for each $ZZ$ interaction associated with the nearest neighbor interactions on the grid. Small numerical experiments indicate that an enveloping classical algorithm can be used to find the parameters which sit on the grid to optimize an objective function with a different connectivity. We discuss strategies for finding good parameters but offer no evidence yet that the proposed approach can beat the best classical algorithms. Ultimately the strength of this approach will be determined by running on actual hardware.
• Many determinantal inequalities for positive definite block matrices are consequences of general entropy inequalities, specialised to Gaussian distributed vectors with prescribed covariances. In particular, strong subadditivity (SSA) yields \beginequation* \ln\det V_AC + \ln\det V_BC - \ln\det V_ABC - \ln\det V_C ≥0 \endequation* for all $3\times 3$-block matrices $V_{ABC}$, where subscripts identify principal submatrices. We shall refer to the above inequality as SSA of log-det entropy. In this paper we develop further insights on the properties of the above inequality and its applications to classical and quantum information theory. In the first part of the paper, we show how to find known and new necessary and sufficient conditions under which saturation with equality occurs. Subsequently, we discuss the role of the classical transpose channel (also known as Petz recovery map) in this problem and find its action explicitly. We then prove some extensions of the saturation theorem, by finding faithful lower bounds on a log-det conditional mutual information. In the second part, we focus on quantum Gaussian states, whose covariance matrices are not only positive but obey additional constraints due to the uncertainty relation. For Gaussian states, the log-det entropy is equivalent to the Rényi entropy of order $2$. We provide a strengthening of log-det SSA for quantum covariance matrices that involves the so-called Gaussian Rényi-$2$ entanglement of formation, a well-behaved entanglement measure defined via a Gaussian convex roof construction. We then employ this result to define a log-det entropy equivalent of the squashed entanglement, which is remarkably shown to coincide with the Gaussian Rényi-$2$ entanglement of formation. This allows us to establish useful properties of such measure(s), like monogamy, faithfulness, and additivity on Gaussian states.
• Random tensor networks provide useful models that incorporate various important features of holographic duality. A tensor network is usually defined for a fixed graph geometry specified by the connection of tensors. In this paper, we generalize the random tensor network approach to allow quantum superposition of different spatial geometries. We set up a framework in which all possible bulk spatial geometries, characterized by weighted adjacent matrices of all possible graphs, are mapped to the boundary Hilbert space and form an overcomplete basis of the boundary. We name such an overcomplete basis as holographic coherent states. A generic boundary state can be expanded on this basis, which describes the state as a superposition of different spatial geometries in the bulk. We discuss how to define distinct classical geometries and small fluctuations around them. We show that small fluctuations around classical geometries define "code subspaces" which are mapped to the boundary Hilbert space isometrically with quantum error correction properties. In addition, we also show that the overlap between different geometries is suppressed exponentially as a function of the geometrical difference between the two geometries. The geometrical difference is measured in an area law fashion, which is a manifestation of the holographic nature of the states considered.
• Open quantum systems evolving according to discrete-time dynamics are capable, unlike continuous-time counterparts, to converge to a stable equilibrium in finite time with zero error. We consider dissipative quantum circuits consisting of sequences of quantum channels subject to specified quasi-locality constraints, and determine conditions under which stabilization of a pure multipartite entangled state of interest may be exactly achieved in finite time. Special emphasis is devoted to characterizing scenarios where finite-time stabilization may be achieved robustly with respect to the order of the applied quantum maps, as suitable for unsupervised control architectures. We show that if a decomposition of the physical Hilbert space into virtual subsystems is found, which is compatible with the locality constraint and relative to which the target state factorizes, then robust stabilization may be achieved by independently cooling each component. We further show that if the same condition holds for a scalable class of pure states, a continuous-time quasi-local Markov semigroup ensuring rapid mixing can be obtained. Somewhat surprisingly, we find that the commutativity of the canonical parent Hamiltonian one may associate to the target state does not directly relate to its finite-time stabilizability properties, although in all cases where we can guarantee robust stabilization, a (possibly non-canonical) commuting parent Hamiltonian may be found. Beside graph states, quantum states amenable to finite-time robust stabilization include a class of universal resource states displaying two-dimensional symmetry-protected topological order, along with tensor network states obtained by generalizing a construction due to Bravyi and Vyalyi. Extensions to representative classes of mixed graph-product and thermal states are also discussed.
• Physical theories can be characterized in terms of their state spaces and their evolutive equations. The kinematical structure and the dynamical structure of finite dimensional quantum theory are, in light of the Choi-Jamiołkowski isomorphism, one and the same --- namely the homogeneous self-dual cones of positive semi-definite linear endomorphisms on finite dimensional complex Hilbert spaces. From the perspective of category theory, these cones are the sets of morphisms in finite dimensional quantum theory as a dagger compact closed category. Understanding the intricate geometry of these cones and charting the wider landscape for their host category is imperative for foundational physics. In Part I of this thesis, we study the shape of finite dimensional quantum theory in terms of quantum information. In Part II of this thesis, we move beyond quantum theory within the vein of Euclidean Jordan algebras. In posting this thesis on the arXiv, we hope that it might serve as a useful resource for those interested in its subjects.
• Superconducting quantum circuits are promising candidate for building scalable quantum computers. Here, we use a four-qubit superconducting quantum processor to solve a two-dimensional system of linear equations based on a quantum algorithm proposed by Harrow, Hassidim, and Lloyd [Phys. Rev. Lett. \textbf103, 150502 (2009)], which promises an exponential speedup over classical algorithms under certain circumstances. We benchmark the solver with quantum inputs and outputs, and characterize it by non-trace-preserving quantum process tomography, which yields a process fidelity of $0.837\pm0.006$. Our results highlight the potential of superconducting quantum circuits for applications in solving large-scale linear systems, a ubiquitous task in science and engineering.
• The driving force behind deep networks is their ability to compactly represent rich classes of functions. The primary notion for formally reasoning about this phenomenon is expressive efficiency, which refers to a situation where one network must grow unfeasibly large in order to realize (or approximate) functions of another. To date, expressive efficiency analyses focused on the architectural feature of depth, showing that deep networks are representationally superior to shallow ones. In this paper we study the expressive efficiency brought forth by connectivity, motivated by the observation that modern networks interconnect their layers in elaborate ways. We focus on dilated convolutional networks, a family of deep models delivering state of the art performance in sequence processing tasks. By introducing and analyzing the concept of mixed tensor decompositions, we prove that interconnecting dilated convolutional networks can lead to expressive efficiency. In particular, we show that even a single connection between intermediate layers can already lead to an almost quadratic gap, which in large-scale settings typically makes the difference between a model that is practical and one that is not. Empirical evaluation demonstrates how the expressive efficiency of connectivity, similarly to that of depth, translates into gains in accuracy. This leads us to believe that expressive efficiency may serve a key role in the development of new tools for deep network design.
• Mar 21 2017 cs.CV arXiv:1703.06817v1
Convolutional Neural Networks (CNNs) have been successfully applied to many computer vision tasks, such as image classification. By performing linear combinations and element-wise nonlinear operations, these networks can be thought of as extracting solely first-order information from an input image. In the past, however, second-order statistics computed from handcrafted features, e.g., covariances, have proven highly effective in diverse recognition tasks. In this paper, we introduce a novel class of CNNs that exploit second-order statistics. To this end, we design a series of new layers that (i) extract a covariance matrix from convolutional activations, (ii) compute a parametric, second-order transformation of a matrix, and (iii) perform a parametric vectorization of a matrix. These operations can be assembled to form a Covariance Descriptor Unit (CDU), which replaces the fully-connected layers of standard CNNs. Our experiments demonstrate the benefits of our new architecture, which outperform the first-order CNNs, while relying on up to 90% fewer parameters.
• This paper contributes a new large-scale dataset for weakly supervised cross-media retrieval, named Twitter100k. Current datasets, such as Wikipedia, NUS Wide and Flickr30k, have two major limitations. First, these datasets are lacking in content diversity, i.e., only some pre-defined classes are covered. Second, texts in these datasets are written in well-organized language, leading to inconsistency with realistic applications. To overcome these drawbacks, the proposed Twitter100k dataset is characterized by two aspects: 1) it has 100,000 image-text pairs randomly crawled from Twitter and thus has no constraint in the image categories; 2) text in Twitter100k is written in informal language by the users. Since strongly supervised methods leverage the class labels that may be missing in practice, this paper focuses on weakly supervised learning for cross-media retrieval, in which only text-image pairs are exploited during training. We extensively benchmark the performance of four subspace learning methods and three variants of the Correspondence AutoEncoder, along with various text features on Wikipedia, Flickr30k and Twitter100k. Novel insights are provided. As a minor contribution, inspired by the characteristic of Twitter100k, we propose an OCR-based cross-media retrieval method. In experiment, we show that the proposed OCR-based method improves the baseline performance.
• Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success of previous approaches, they are, however, based on co-occurrence of items. Once there are no co-occurrence items available in documents, they will not work well. Inspired by distributed representations of words in the literature of natural language processing, we propose a novel approach to measuring the similarity of papers based on distributed representations learned from the citation context of papers. We view the set of papers as the vocabulary, define the weighted citation context of papers, and convert it to weight matrix similar to the word-word cooccurrence matrix in natural language processing. After that we explore a variant of matrix factorization approach to train distributed representations of papers on the matrix, and leverage the distributed representations to measure similarities of papers. In the experiment, we exhibit that our approach outperforms state-of-theart citation-based approaches by 25%, and better than other distributed representation based methods.
• We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end -- from pixels to multi-agent multi-round dialog to game reward. We demonstrate two experimental results. First, as a 'sanity check' demonstration of pure RL (from scratch), we show results on a synthetic world, where the agents communicate in ungrounded vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find that two bots invent their own communication protocol and start using certain symbols to ask/answer about certain visual attributes (shape/color/style). Thus, we demonstrate the emergence of grounded language and communication among 'visual' dialog agents with no human supervision. Second, we conduct large-scale real-image experiments on the VisDial dataset, where we pretrain with supervised dialog data and show that the RL 'fine-tuned' agents significantly outperform SL agents. Interestingly, the RL Qbot learns to ask questions that Abot is good at, ultimately resulting in more informative dialog and a better team.
• Zero-shot learning (ZSL) is a challenging task aiming at recognizing novel classes without any training instances. In this paper we present a simple but high-performance ZSL approach by generating pseudo feature representations (GPFR). Given the dataset of seen classes and side information of unseen classes (e.g. attributes), we synthesize feature-level pseudo representations for novel concepts, which allows us access to the formulation of unseen class predictor. Firstly we design a Joint Attribute Feature Extractor (JAFE) to acquire understandings about attributes, then construct a cognitive repository of attributes filtered by confidence margins, and finally generate pseudo feature representations using a probability based sampling strategy to facilitate subsequent training process of class predictor. We demonstrate the effectiveness in ZSL settings and the extensibility in supervised recognition scenario of our method on a synthetic colored MNIST dataset (C-MNIST). For several popular ZSL benchmark datasets, our approach also shows compelling results on zero-shot recognition task, especially leading to tremendous improvement to state-of-the-art mAP on zero-shot retrieval task.
• Mar 21 2017 cs.IR cs.LG stat.ML arXiv:1703.06324v1
Learning an encoding of feature vectors in terms of an over-complete dictionary or a probabilistic information geometric (Fisher vectors) construct is wide-spread in statistical signal processing and computer vision. In content based information retrieval using deep-learning classifiers, such encodings are learnt on the flattened last layer, without adherence to the multi-linear structure of the underlying feature tensor. We illustrate a variety of feature encodings incl. sparse dictionary coding and Fisher vectors along with proposing that a structured tensor factorization scheme enables us to perform retrieval that is at par, in terms of average precision, with Fisher vector encoded image signatures. In short, we illustrate how structural constraints increase retrieval fidelity.
• Machine learning has proven to be a valuable tool to approximate functions in high-dimensional spaces. Unfortunately, analysis of these models to extract the relevant physics is never as easy as applying machine learning to a large dataset in the first place. Here we present a description of atomic systems that generates machine learning representations with a direct path to physical interpretation. As an example, we demonstrate its usefulness as a universal descriptor of grain boundary systems. Grain boundaries in crystalline materials are a quintessential example of a complex, high-dimensional system with broad impact on many physical properties including strength, ductility, corrosion resistance, crack resistance, and conductivity. In addition to modeling such properties, the method also provides insight into the physical "building blocks" that influence them. This opens the way to discover the underlying physics behind behaviors by understanding which building blocks map to particular properties. Once the structures are understood, they can then be optimized for desirable behaviors.
• Mar 21 2017 cs.CV arXiv:1703.06211v2
Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules. In this work, we introduce two new modules to enhance the transformation modeling capacity of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the effectiveness of our approach on sophisticated vision tasks of object detection and semantic segmentation. The code would be released.
• We propose the use of specific dynamical processes and more in general of ideas from Physics to model the evolution in time of musical structures. We apply this approach to two Études by F. Chopin, namely op.10 n.3 and op.25 n.1, proposing some original description based on concepts of symmetry breaking/restoration and quantum coherence, which could be useful for interpretation. In this analysis, we take advantage of colored musical scores, obtained by implementing Scriabin's color code for sounds to musical notation.
• Mar 21 2017 cs.CV arXiv:1703.06870v2
• A channel for the formation of stellar mass black holes (BHs) is through hierarchical mergers of smaller BHs. Repeated mergers between comparable mass BHs leave an imprint on the spin of the resulting black hole, since the final BH spin is largely determined by the orbital angular momentum of the merging binary system. It has been shown that a population of supermassive BHs that forms through repeated mergers will have a distribution of spin magnitudes centered around a dimensionless spin magnitude of $a \sim 0.7$. We show that for stellar mass BHs forming from hierarchical mergers, the distribution of spin magnitudes is universal, with a peak at $a \sim 0.7$ and little support below $a \sim 0.4$. This is largely insensitive to details of the hierarchical merger model: within realistic astrophysical scenarios, the spin distribution is robust against changes to the mass ratio of the merging binaries, the initial spin distribution of the first generation of BHs, and the number of merger generations. Using the universal distribution, we explore the ability of spin measurements from ground-based gravitational wave detectors to constrain hierarchical merger scenarios. We apply a hierarchical Bayesian mixture model to mock gravitational wave data and argue that the fraction of BHs that formed through hierarchical mergers will be constrained with $\mathcal{O}(100)$ LIGO BBH detections. We also argue that a BH with spin magnitude $a < 0.4$ is unlikely to have formed hierarchically, and with $\mathcal{O}(10)$ detections we could falsify a model in which all component BHs form hierarchically. [Abridged]
• Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.
• Under certain circumstances, a swarm of a species of trail-laying ants known as army ants can become caught in a doomed revolving motion known as the death spiral, in which each ant follows the one in front of it in a never-ending loop until they all drop dead from exhaustion. This phenomenon, as well as the ordinary motions of many ant species and certain slime molds, can be modeled using reinforced random walks and random walks with memory. In a reinforced random walk, the path taken by a moving particle is influenced by the previous paths taken by other particles. In a random walk with memory, a particle is more likely to continue along its line of motion than change its direction. Both memory and reinforcement have been studied independently in random walks with interesting results. However, real biological motion is a result of a combination of both memory and reinforcement. In this paper, we construct a continuous random walk model based on diffusion-advection partial differential equations that combine memory and reinforcement. We find an axi-symmetric, time-independent solution to the equations that resembles the death spiral. Finally, we prove numerically that the obtained steady-state solution is stable.
• The sufficient decrease technique has been widely used in deterministic optimization, even for non-convex optimization problems, such as line-search techniques. Motivated by those successes, we propose a novel sufficient decrease framework for a class of variance reduced stochastic gradient descent (VR-SGD) methods such as SVRG and SAGA. In order to make sufficient decrease for stochastic optimization, we design a new sufficient decrease criterion. We then introduce a coefficient \theta to satisfy the sufficient decrease property, which takes the decisions to shrink, expand or move in the opposite direction (i.e., \theta x for the variable x), and give two specific update rules for Lasso and ridge regression. Moreover, we analyze the convergence properties of our algorithms for strongly convex problems, which show that both of our algorithms attain linear convergence rates. We also provide the convergence guarantees of both of our algorithms for non-strongly convex problems. Our experimental results further verify that our algorithms achieve better performance than their counterparts.
• The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms. We first review the trends when designing CNN architectures. Through this literature overview we discuss which are the crucial points to consider for efficiently learning timbre representations using CNNs. From this discussion we propose a design strategy meant to capture the relevant time-frequency contexts for learning timbre, which permits using domain knowledge for designing architectures. In addition, one of our main goals is to design efficient CNN architectures - what reduces the risk of these models to over-fit, since CNNs' number of parameters is minimized. Several architectures based on the design principles we propose are successfully assessed for different research tasks related to timbre: singing voice phoneme classification, musical instrument recognition and music auto-tagging.
• This paper introduces QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in the network architecture. The QMDP-net is fully differentiable and allows end-to-end training. We train a QMDP-net over a set of different environments so that it can generalize over new ones. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, it also sometimes outperformed the QMDP algorithm, which generated the data for learning, because of QMDP-net's robustness resulting from end-to-end learning.
• As an emerging research topic, online class imbalance learning often combines the challenges of both class imbalance and concept drift. It deals with data streams having very skewed class distributions, where concept drift may occur. It has recently received increased research attention; however, very little work addresses the combined problem where both class imbalance and concept drift coexist. As the first systematic study of handling concept drift in class-imbalanced data streams, this paper first provides a comprehensive review of current research progress in this field, including current research focuses and open challenges. Then, an in-depth experimental study is performed, with the goal of understanding how to best overcome concept drift in online learning with class imbalance. Based on the analysis, a general guideline is proposed for the development of an effective algorithm.
• Translating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen significant improvement through the adoption of recurrent neural networks (RNN). Meanwhile, text-to-image generation begun to generate plausible images using datasets of specific categories like birds and flowers. We've even seen image generation from multi-category datasets such as the Microsoft Common Objects in Context (MSCOCO) through the use of generative adversarial networks (GANs). Synthesizing objects with a complex shape, however, is still challenging. For example, animals and humans have many degrees of freedom, which means that they can take on many complex shapes. We propose a new training method called Image-Text-Image (I2T2I) which integrates text-to-image and image-to-text (image captioning) synthesis to improve the performance of text-to-image synthesis. We demonstrate that %the capability of our method to understand the sentence descriptions, so as to I2T2I can generate better multi-categories images using MSCOCO than the state-of-the-art. We also demonstrate that I2T2I can achieve transfer learning by using a pre-trained image captioning module to generate human images on the MPII Human Pose
• SKA precursors are capable of detecting hundreds of galaxies in HI in a single 12 hours pointing. In deeper surveys one will probe more easily faint HI structures, typically located in the vicinity of galaxies, such as tails, filaments, and extraplanar gas. The importance of interactive visualization has proven to be fundamental for the exploration of such data as it helps users to receive immediate feedback when manipulating the data. We have developed SlicerAstro, a 3-D interactive viewer with new analysis capabilities, based on traditional 2-D input/output hardware. These capabilities enhance the data inspection, allowing faster analysis of complex sources than with traditional tools. SlicerAstro is an open-source extension of 3DSlicer, a multi-platform open source software package for visualization and medical image processing. We demonstrate the capabilities of the current stable binary release of SlicerAstro, which offers the following features: i) handling of FITS files and astronomical coordinate systems; ii) coupled 2-D/3-D visualization; iii) interactive filtering; iv) interactive 3-D masking; v) and interactive 3-D modeling. In addition, SlicerAstro has been designed with a strong, stable and modular C++ core, and its classes are also accessible via Python scripting, allowing great flexibility for user-customized visualization and analysis tasks.
• The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate. To deal with this issue, we propose to build topic spaces from summarized documents. In this paper, we present a study of topic space representation in the context of big data. The topic space representation behavior is analyzed on different languages. Experiments show that topic spaces estimated from text summaries are as relevant as those estimated from the complete documents. The real advantage of such an approach is the processing time gain: we showed that the processing time can be drastically reduced using summarized documents (more than 60\% in general). This study finally points out the differences between thematic representations of documents depending on the targeted languages such as English or latin languages.
• These are the proceedings of the 14th International Workshop on Formal Engineering approaches to Software Components and Architectures (FESCA). The workshop was held on April 22, 2017 in Uppsala (Sweden) as a satellite event to the European Joint Conference on Theory and Practice of Software (ETAPS'17). The aim of the FESCA workshop is to bring together junior researchers from formal methods, software engineering, and industry interested in the development and application of formal modelling approaches as well as associated analysis and reasoning techniques with practical benefits for software engineering. In recent years, the growing importance of functional correctness and the increased relevance of system quality properties (e.g. performance, reliability, security) have stimulated the emergence of analytical and modelling techniques for the design and development of software systems. With the increasing complexity and utilization of today's software systems, FESCA aims at addressing two research questions: (1) what role is played by the software design phase in the systematic addressing of the analytical and modelling challenges, and (2) how can formal and semi-formal techniques be effectively applied to make the issues easier to address automatically, with lower human intervention.
• The study of eye gaze fixations on photographic images is an active research area. In contrast, the image subcategory of freehand sketches has not received as much attention for such studies. In this paper, we analyze the results of a free-viewing gaze fixation study conducted on 3904 freehand sketches distributed across 160 object categories. Our analysis shows that fixation sequences exhibit marked consistency within a sketch, across sketches of a category and even across suitably grouped sets of categories. This multi-level consistency is remarkable given the variability in depiction and extreme image content sparsity that characterizes hand-drawn object sketches. In our paper, we show that the multi-level consistency in the fixation data can be exploited to (a) predict a test sketch's category given only its fixation sequence and (b) build a computational model which predicts part-labels underlying fixations on objects. We hope that our findings motivate the community to deem sketch-like representations worthy of gaze-based studies vis-a-vis photographic images.
• We compute the group of braided tensor autoequivalences and the Brauer-Picard group of the representation category of the small quantum group $\mathfrak{u}_q(\mathfrak{g})$, where $q$ is a root of unity.
• A selective classifier (f,g) comprises a classification function f and a binary selection function g, which determines if the classifier abstains from prediction, or uses f to predict. The classifier is called pointwise-competitive if it classifies each point identically to the best classifier in hindsight (from the same class), whenever it does not abstain. The quality of such a classifier is quantified by its rejection mass, defined to be the probability mass of the points it rejects. A "fast" rejection rate is achieved if the rejection mass is bounded from above by O(1/m) where m is the number of labeled examples used to train the classifier (and O hides logarithmic factors). Pointwise-competitive selective (PCS) classifiers are intimately related to disagreement-based active learning and it is known that in the realizable case, a fast rejection rate of a known PCS algorithm (called Consistent Selective Strategy) is equivalent to an exponential speedup of the well-known CAL active algorithm. We focus on the agnostic setting, for which there is a known algorithm called LESS that learns a PCS classifier and achieves a fast rejection rate (depending on Hanneke's disagreement coefficient) under strong assumptions. We present an improved PCS learning algorithm called ILESS for which we show a fast rate (depending on Hanneke's disagreement coefficient) without any assumptions. Our rejection bound smoothly interpolates the realizable and agnostic settings. The main result of this paper is an equivalence between the following three entities: (i) the existence of a fast rejection rate for any PCS learning algorithm (such as ILESS); (ii) a poly-logarithmic bound for Hanneke's disagreement coefficient; and (iii) an exponential speedup for a new disagreement-based active learner called ActiveiLESS.
• Understanding the relationship between spontaneous stochastic fluctuations and the topology of the underlying gene regulatory network is fundamental for the study of gene expression, especially at the molecular level. Here by solving the analytical steady-state distribution of the protein copy number in a general kinetic model of gene expression, we reveal a quantitative relation between stochastic fluctuations and feedback regulation at the single-molecule level, which provides novel insights into how and to what extent a feedback loop can enhance or suppress molecular fluctuations. Based on such relation, we also develop an effective method to extract the topological information of gene regulatory networks from single-cell gene expression data. The theory is demonstrated by numerical simulations and validated quantitatively by single-cell data analysis of a synthetic gene circuit integrated in human kidney cells.
• The massive amount of available data potentially used to discover patters in machine learning is a challenge for kernel based algorithms with respect to runtime and storage capacities. Local approaches might help to relieve these issues. From a statistical point of view local approaches allow additionally to deal with different structures in the data in different ways. This paper analyses properties of localized kernel based, non-parametric statistical machine learning methods, in particular of support vector machines (SVMs) and methods close to them. We will show there that locally learnt kernel methods are universal consistent. Furthermore, we give an upper bound for the maxbias in order to show statistical robustness of the proposed method.
• Most state-of-the-art text detection methods are specific to horizontal Latin text and are not fast enough for real-time applications. We introduce Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. A segment is an oriented box covering a part of a word or text line; A link connects two adjacent segments, indicating that they belong to the same word or text line. Both elements are detected densely at multiple scales by an end-to-end trained, fully-convolutional neural network. Final detections are produced by combining segments connected by links. Compared with previous methods, SegLink improves along the dimensions of accuracy, speed, and ease of training. It achieves an f-measure of 75.0% on the standard ICDAR 2015 Incidental (Challenge 4) benchmark, outperforming the previous best by a large margin. It runs at over 20 FPS on 512x512 images. Moreover, without modification, SegLink is able to detect long lines of non-Latin text, such as Chinese.
• We provide a new proof of convergence to motion by mean curvature (MMC) for the Merriman-Bence-Osher (MBO) thresholding algorithm. The proof is elementary and does not rely on maximum principle for the scheme. The strategy is to construct a natural ansatz of the solution and then estimate the error. The proof thus also provides a convergence rate. Only some weak integrability assumptions of the heat kernel, but not its positivity, is used. Currently the result is proved in the case when smooth and classical solution of MMC exists.
• Mar 21 2017 cs.LG arXiv:1703.06514v1
We propose a new method for training iterative collective classifiers for labeling nodes in network data. The iterative classification algorithm (ICA) is a canonical method for incorporating relational information into classification. Yet, existing methods for training ICA models rely on the assumption that relational features reflect the true labels of the nodes. This unrealistic assumption introduces a bias that is inconsistent with the actual prediction algorithm. In this paper, we introduce recurrent collective classification (RCC), a variant of ICA analogous to recurrent neural network prediction. RCC accommodates any differentiable local classifier and relational feature functions. We provide gradient-based strategies for optimizing over model parameters to more directly minimize the loss function. In our experiments, this direct loss minimization translates to improved accuracy and robustness on real network data. We demonstrate the robustness of RCC in settings where local classification is very noisy, settings that are particularly challenging for ICA.
• Taking image and question as the input of our method, it can output the text-based answer of the query question about the given image, so called Visual Question Answering (VQA). There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the basic questions of the main question, given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question. We formulate the basic questions generation problem as a LASSO optimization problem, and also propose a criterion about how to exploit these basic questions to help answer main question. Our method is evaluated on the challenging VQA dataset, and yields the competitive performance compared to state-of-the-art.
• The term gestalt has been widely used in the field of psychology which defined the perception of human mind to group any object not in part but as a unified whole. Music in general is polytonic i.e. a combination of a number of pure tones (frequencies) mixed together in a manner that sounds harmonius. The study of human brain response due to different frequency groups of acoustic signal can give us an excellent insight regarding the neural and functional architecture of brain functions. In this work we have tried to analyze the effect of different frequency bands of music on the various frequency rhythms of human brain obtained from EEG data of 5 participants. Four (4) widely popular Rabindrasangeet clips were subjected to Wavelet Transform method for extracting five resonant frequency bands from the original music signal. These resonant frequency bands were presented to the subjects as auditory stimulus and EEG signals recorded simultaneously in 19 different locations of the brain. The recorded EEG signals were noise cleaned and subjected to Multifractal Detrended Fluctuation Analysis (MFDFA) technique on the alpha, theta and gamma frequency range. Thus, we obtained the complexity values (in the form of multifractal spectral width) in alpha, theta and gamma EEG rhythms corresponding to different frequency bands of music. We obtain frequency specific arousal based response in different lobes of brain as well as in specific EEG bands corresponding to musical stimuli. This revelation can be of immense importance when it comes to the field of cognitive music therapy.
• In previous work the authors introduced a new class of modular quasi-Hopf algebras $D^{\omega}(G, A)$ associated to a finite group $G$, a central subgroup $A$, and a $3$-cocycle $\omega\in Z^3(G, C^x)$. In the present paper we propose a description of the class of orbifold models of rational vertex operator algebras whose module category is tensor equivalent to $D^{\omega}(G, A)$-mod. The paper includes background on quasi-Hopf algebras and a discussion of some relevant orbifolds.
• Deliberating on large or continuous state spaces have been long standing challenges in reinforcement learning. Temporal Abstraction have somewhat made this possible, but efficiently planing using temporal abstraction still remains an issue. Moreover using spatial abstractions to learn policies for various situations at once while using temporal abstraction models is an open problem. We propose here an efficient algorithm which is convergent under linear function approximation while planning using temporally abstract actions. We show how this algorithm can be used along with randomly generated option models over multiple time scales to plan agents which need to act real time. Using these randomly generated option models over multiple time scales are shown to reduce number of decision epochs required to solve the given task, hence effectively reducing the time needed for deliberation.
• A semantic segmentation algorithm must assign a label to every pixel in an image. Recently, semantic segmentation of RGB imagery has advanced significantly due to deep learning. Because creating datasets for semantic segmentation is laborious, these datasets tend to be significantly smaller than object recognition datasets. This makes it difficult to directly train a deep neural network for semantic segmentation, because it will be prone to overfitting. To cope with this, deep learning models typically use convolutional neural networks pre-trained on large-scale image classification datasets, which are then fine-tuned for semantic segmentation. For non-RGB imagery, this is currently not possible because large-scale labeled non-RGB datasets do not exist. In this paper, we developed two deep neural networks for semantic segmentation of multispectral remote sensing imagery. Prior to training on the target dataset, we initialize the networks with large amounts of synthetic multispectral imagery. We show that this significantly improves results on real-world remote sensing imagery, and we establish a new state-of-the-art result on the challenging Hamlin Beach State Park Dataset.
• The LIGO results are among the greatest experimental achievements of all times. Time and again scientists have compared this feat to Galileo pointing his telescope to the sky, offering instead an 'ear' to the cosmos. After the remarkable landmark of detection, the physics community will soon turn into the study of the properties of the sources, addressing fundamental questions in astrophysics and cosmology. A combined numerical and analytic effort to tackle the binary problem is of paramount importance in light of the nascent program of multi-messenger astronomy. The century of gravitational wave science is in the making, and many discoveries are yet to come in the advent of a new era of 'precision gravity'.
• We construct and investigate Specht modules $\mathcal{S}^\lambda$ for cyclotomic quiver Hecke algebras in type $C^{(1)}_\ell$ and $C_\infty$, which are labelled by multipartitions $\lambda$. It is shown that in type $C_\infty$, the Specht module $\mathcal{S}^\lambda$ has a homogeneous basis indexed by standard tableaux of shape $\lambda$, which yields a graded character formula and good properties with the exact functors $E_i^\Lambda$ and $F_i^\Lambda$. For type $C^{(1)}_\ell$, we propose a conjecture.
• We propose a fully-automated method for accurate and robust detection and segmentation of potentially cancerous lesions found in the liver and in lymph nodes. The process is performed in three steps, including organ detection, lesion detection and lesion segmentation. Our method applies machine learning techniques such as marginal space learning and convolutional neural networks, as well as active contour models. The method proves to be robust in its handling of extremely high lesion diversity. We tested our method on volumetric computed tomography (CT) images, including 42 volumes containing liver lesions and 86 volumes containing 595 pathological lymph nodes. Preliminary results under 10-fold cross validation show that for both the liver lesions and the lymph nodes, a total detection sensitivity of 0.53 and average Dice score of $0.71 \pm 0.15$ for segmentation were obtained.
• In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and improve their structural coherence, has not been explored. We trained the presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the discriminability of the generated images with Inception-Score, as well as their diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our approach outperforms the state-of-the-art models, i.e., its inception score is 3.45, corresponding to a relative increase of 7.8% compared to the recently introduced StackGan. A comparison of the mean MS-SSIM scores of the training and generated samples per class shows that our approach is able to generate highly diverse images with an average MS-SSIM of 0.14 over all generated classes.
• In this work, we propose the combined usage of low- and high-level blocks of convolutional neural networks (CNNs) for improving object recognition. While recent research focused on either propagating the context from all layers, e.g. ResNet, (including the very low-level layers) or having multiple loss layers (e.g. GoogLeNet), the importance of the features close to the higher layers is ignored. This paper postulates that the use of context closer to the high-level layers provides the scale and translation invariance and works better than using the top layer only. In particular, we extend AlexNet and GoogLeNet by additional connections in the top $n$ layers. In order to demonstrate the effectiveness of the proposed approach, we evaluated it on the standard ImageNet task. The relative reduction of the classification error is around 1-2% without affecting the computational cost. Furthermore, we show that this approach is orthogonal to typical test data augmentation techniques, as recently introduced by Szegedy et al. (leading to a runtime reduction of 144 during test time).
• This paper addresses the problem of RGBD object recognition in real-world applications, where large amounts of annotated training data are typically unavailable. To overcome this problem, we propose a novel, weakly-supervised learning architecture (DCNN-GPC) which combines parametric models (a pair of Deep Convolutional Neural Networks (DCNN) for RGB and D modalities) with non-parametric models (Gaussian Process Classification). Our system is initially trained using a small amount of labeled data, and then automatically prop- agates labels to large-scale unlabeled data. We first run 3D- based objectness detection on RGBD videos to acquire many unlabeled object proposals, and then employ DCNN-GPC to label them. As a result, our multi-modal DCNN can be trained end-to-end using only a small amount of human annotation. Finally, our 3D-based objectness detection and multi-modal DCNN are integrated into a real-time detection and recognition pipeline. In our approach, bounding-box annotations are not required and boundary-aware detection is achieved. We also propose a novel way to pretrain a DCNN for the depth modality, by training on virtual depth images projected from CAD models. We pretrain our multi-modal DCNN on public 3D datasets, achieving performance comparable to state-of-the-art methods on Washington RGBS Dataset. We then finetune the network by further training on a small amount of annotated data from our novel dataset of industrial objects (nuclear waste simulants). Our weakly supervised approach has demonstrated to be highly effective in solving a novel RGBD object recognition application which lacks of human annotations.
• Decision-makers often learn by acquiring information from distinct sources that possibly provide complementary information. We consider a decision-maker who sequentially samples from a finite set of Gaussian signals, and wants to predict a persistent multi-dimensional state at an unknown final period. What signal should he choose to observe in each period? Related problems about optimal experimentation and dynamic learning tend to have solutions that can only be approximated or implicitly characterized. In contrast, we find that in our problem, the dynamically optimal path of signal acquisitions generically: (1) eventually coincides at every period with the myopic path of signal acquisitions, and (2) eventually achieves "total optimality," so that at every large period, the decision-maker will not want to revise his previous signal acquisitions, even if given this opportunity. In special classes of environments that we describe, these properties attain not only eventually, but from period 1. Finally, we characterize the asymptotic frequency with which each signal is chosen, and how this depends on primitives of the informational environment.
• Mar 21 2017 q-fin.PR physics.soc-ph arXiv:1703.06351v1
We consider the estimation of binary election outcomes as martingales and propose an arbitrage pricing when one continuously updates estimates. We argue that the estimator needs to be priced as a binary option as the arbitrage valuation minimizes the conventionally used Brier score for tracking the accuracy of probability assessors. We create a dual martingale process $Y$, in $[L,H]$ from the standard arithmetic Brownian motion, $X$ in $(-\infty, \infty)$ and price elections accordingly. The dual process $Y$ can represent the numerical votes needed for success. We show the relationship between the volatility of the estimator in relation to that of the underlying variable. When there is a high uncertainty about the final outcome, 1) the arbitrage value of the binary gets closer to 50\%, 2) the estimate should not undergo large changes even if polls or other bases show significant variations. There are arbitrage relationships between 1) the binary value, 2) the estimation of $Y$, 3) the volatility of the estimation of $Y$ over the remaining time to expiration. We note that these arbitrage relationships were often violated by the various forecasting groups in the U.S. presidential elections of 2016, as well as the notion that all intermediate assessments of the success of a candidate need to be considered, not just the final one.
• Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.

Thomas Klimpel Apr 20 2017 09:16 UTC

This paper [appeared][1] in February 2016 in the peer reviewed interdisciplinary journal Chaos by the American Institute of Physics (AIP).

It has been reviewed publicly by amateurs both [favorably][2] and [unfavorably][3]. The favorable review took the last sentence of the abstract ("These invalid

...(continued)
Veaceslav Molodiuc Apr 19 2017 07:26 UTC

http://ibiblio.org/e-notes/Chaos/intermit.htm

Zoltán Zimborás Apr 18 2017 09:47 UTC

Great note. I real like the two end-sentences: "Of course, any given new approach to a hard and extensively studied problem has a very low probability to lead to a direct solution (some popular accounts may not have emphasized this to the degree we would have preferred). But arguably, this makes the

...(continued)
James Wootton Apr 18 2017 08:29 UTC

Interesting to start getting perspectives from actual end users. But this does focus massively on quantum annealing, rather than a 'true' universal and fault-tolerant QC.

Aram Harrow Apr 17 2017 13:45 UTC

It must feel good to get this one out there! :)

Planat Apr 14 2017 08:11 UTC

First of all, thanks to all for helping to clarify some hidden points of our paper.
As you can see, the field norm generalizes the standard Hilbert-Schmidt norm.
It works for SIC [e.g. d=2, d=3 (the Hesse) and d=8 (the Hoggar)].

The first non-trivial case is with d=4 when one needs to extend th

...(continued)
Robin Blume-Kohout Apr 14 2017 03:03 UTC

Okay, I see the resolution to my confusion now (and admit that I was confused). Thanks to Michel, Marcus, Blake, and Steve!

Since I don't know the first thing about cyclotomic field norms... can anybody explain the utility of this norm, for this problem? I mean, just to be extreme, I could define

...(continued)
Steve Flammia Apr 13 2017 19:16 UTC

Just to clarify Michel's earlier remark, the field norm for the cyclotomics defines the norm in which these vectors are equiangular, and then they will generally **not** be equiangular in the standard norm based on the Hilbert-Schmidt inner product. In the example that he quotes,
\|(7\pm 3 \sqrt{

...(continued)
Marcus Appleby Apr 13 2017 19:16 UTC

I worded that badly, since you clearly have explained the sense in which you are using the word. I am wondering, however, how your definition relates to the usual one. Is it a generalization? Or just plain different? For instance, would a SIC be equiangular relative to your definition (using SI

...(continued)
Marcus Appleby Apr 13 2017 18:54 UTC

I am a little confused by this. As I use the term, lines are equiangular if and only if the "trace of pairwise product of (distinct) projectors is constant". You seem to be using the word in a different sense. It might be helpful if you were to explain exactly what is that sense.