# Top arXiv papers

• We show that Clifford operations on qubit stabilizer states are non-contextual and can be represented by non-negative quasi-probability distributions associated with a Wigner-Weyl-Moyal formalism. This is accomplished by generalizing the Wigner-Weyl-Moyal formalism to three generators instead of two---producing an exterior, or Grassmann, algebra---which results in Clifford group gates for qubits that act as a permutation on the finite Weyl phase space points naturally associated with stabilizer states. As a result, a non-negative probability distribution can be associated with each stabilizer state's three-generator Wigner function, and these distributions evolve deterministically to one other under Clifford gates. This corresponds to a hidden variable theory that is non-contextual and local for qubit Clifford operations. Equivalently, we show that qubit Clifford gates can be expressed as propagators within the three-generator Wigner-Weyl-Moyal formalism whose semiclassical expansion is truncated at order $\hbar^0$ with a finite number of terms. The $T$-gate, which extends the Clifford gate set to one capable of universal quantum computation, requires a semiclassical expansion of its propagator to order $\hbar^1$. We compare this approach to previous quasi-probability descriptions of qubits that relied on the two-generator Wigner-Weyl-Moyal formalism and find that the two-generator Weyl symbols of stabilizer states result in a description of evolution under Clifford gates that is state-dependent. We show that this two-generator description of stabilizer evolution is thus a non-local and contextual hidden variable theory---it is a contextual description of a non-contextual process. We have thus extended the established result that Clifford stabilizer operations are non-contextual and have non-negative quasi-probability distributions in the odd $d$-dimensional case to $d=2$ qubits.
• An important class of contextuality arguments in quantum foundations are the All-versus-Nothing (AvN) proofs, generalising a construction originally due to Mermin. We present a general formulation of All-versus-Nothing arguments, and a complete characterisation of all such arguments which arise from stabiliser states. We show that every AvN argument for an n-qubit stabiliser state can be reduced to an AvN proof for a three-qubit state which is local Clifford-equivalent to the tripartite GHZ state. This is achieved through a combinatorial characterisation of AvN arguments, the AvN triple Theorem, whose proof makes use of the theory of graph states. This result enables the development of a computational method to generate all the AvN arguments in $\mathbb{Z}_2$ on n-qubit stabiliser states. We also present new insights into the stabiliser formalism and its connections with logic.
• Communication over a noisy channel is often conducted in a setting in which different input symbols to the channel incur a certain cost. For example, for the additive white Gaussian noise channel, the cost associated with a real number input symbol is the square of its magnitude. In such a setting, it is often useful to know the maximum amount of information that can be reliably transmitted per cost incurred. This is known as the capacity per unit cost. In this paper, we generalize the capacity per unit cost to various communication tasks involving a quantum channel; in particular, we consider classical communication, entanglement-assisted classical communication, private communication, and quantum communication. For each task, we define the corresponding capacity per unit cost and derive a formula for it via the expression for the capacity per channel use. Furthermore, for the special case in which there is a zero-cost quantum state, we obtain expressions for the various capacities per unit cost in terms of an optimized relative entropy involving the zero-cost state. For each communication task, we construct an explicit pulse-position-modulation coding scheme that achieves the capacity per unit cost. Finally, we compute capacities per unit cost for various quantum Gaussian channels.
• We demonstrate a synchronized readout (SR) technique for spectrally selective detection of oscillating magnetic fields with sub-millihertz resolution, using coherent manipulation of solid state spins. The SR technique is implemented in a sensitive magnetometer (~50 picotesla/Hz^(1/2)) based on nitrogen vacancy (NV) centers in diamond, and used to detect nuclear magnetic resonance (NMR) signals from liquid-state samples. We obtain NMR spectral resolution ~3 Hz, which is nearly two orders of magnitude narrower than previously demonstrated with NV based techniques, using a sample volume of ~1 picoliter. This is the first application of NV-detected NMR to sense Boltzmann-polarized nuclear spin magnetization, and the first to observe chemical shifts and J-couplings.
• In this paper, we study convergence properties of the gradient Expectation-Maximization algorithm \citelange1995gradient for Gaussian Mixture Models for general number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers and dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two equal mixture symmetric GMM, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.
• May 25 2017 stat.ML cs.LG arXiv:1705.08826v1
In this work, we introduce the average top-$k$ (AT$_k$) loss as a new ensemble loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the AT$_k$ loss is a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss, but can combines their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradient-based method. We provide an intuitive interpretation of the AT$_k$ loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of MAT$_k$ learning on the classification calibration of the AT$_k$ loss and the error bounds of AT$_k$-SVM. We demonstrate the applicability of minimum average top-$k$ learning for binary classification and regression using synthetic and real datasets.
• Characterization and certification of nonlocal correlations is one of the the central topics in quantum information theory. In this work, we develop the detection methods of entanglement and steering based on the universal uncertainty relations and fine-grained uncertainty relations. In the course of our study, the uncertainty relations are formulated in majorization form, and the uncertainty quantifier can be chosen as any convex Schur concave functions, this leads to a large set of inequalities, including all existing criteria based on entropies. We address the question that if all steerable states (or entangled states) can be witnessed by some uncertainty-based inequality, we find that for pure states and many important families of states, this is the case.
• Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by taking a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.
• Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD-GAN. The new distance measure in MMD-GAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.
• Large-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular [Rahimi and Recht, 2007], where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration [Yang et al., 2014]. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from "Stein effect", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for large-scale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.
• In this work, we provide a general framework for adding a linearizable iterator to data structures with set operations. We propose a condition on these set operations, called locality, so that any data structure implemented from local atomic operations can be augmented with a linearizable iterator as described by our framework. We then apply the iterator framework to various data structures, prove locality of their operations, and demonstrate that the iterator framework does not significantly affect the performance of concurrent operations.
• May 25 2017 cs.CV cs.LG cs.NE stat.ML arXiv:1705.08881v1
The key idea of current deep learning methods for dense prediction is to apply a model on a regular patch centered on each pixel to make pixel-wise predictions. These methods are limited in the sense that the patches are determined by network architecture instead of learned from data. In this work, we propose the dense transformer networks, which can learn the shapes and sizes of patches from data. The dense transformer networks employ an encoder-decoder architecture, and a pair of dense transformer modules are inserted into each of the encoder and decoder paths. The novelty of this work is that we provide technical solutions for learning the shapes and sizes of patches from data and efficiently restoring the spatial correspondence required for dense prediction. The proposed dense transformer modules are differentiable, thus the entire network can be trained. We apply the proposed networks on natural and biological image segmentation tasks and show superior performance is achieved in comparison to baseline methods.
• We use the most recent cosmic microwave background (CMB) data to perform a Bayesian statistical analysis and discuss the observational viability of inflationary models with a non-minimal coupling $\xi$, between the inflaton field and the Ricci scalar. We particularize our analysis to two examples of small and large field inflationary models, namely, the Coleman-Weinberg and the chaotic quartic potentials. We find that (\textiti)the $\xi$ parameter is closely correlated with the primordial amplitude; (\textitii) although improving the agreement with the CMB data in the $r - n_s$ plane, where $r$ is the tensor-to-scalar ratio and $n_s$ the primordial spectral index, a non-null coupling is strongly disfavoured with respect to the minimally coupled standard $\Lambda$CDM model.
• Evaluating the performance of generative models for unsupervised learning is inherently challenging due to the lack of well-defined and tractable objectives. This is particularly difficult for implicit models such as generative adversarial networks (GANs) which perform extremely well in practice for tasks such as sample generation, but sidestep the explicit characterization of a density. We propose Flow-GANs, a generative adversarial network with the generator specified as a normalizing flow model which can perform exact likelihood evaluation. Subsequently, we learn a Flow-GAN using a hybrid objective that integrates adversarial training with maximum likelihood estimation. We show empirically the benefits of Flow-GANs on MNIST and CIFAR-10 datasets in learning generative models that can attain low generalization error based on the log-likelihoods and generate high quality samples. Finally, we show a simple, yet hard to beat baseline for GANs based on Gaussian Mixture Models.
• Semi-supervised learning methods using Generative Adversarial Networks (GANs) have shown promising empirical success recently. Most of these methods use a shared discriminator/classifier which discriminates real examples from fake while also predicting the class label. Motivated by the ability of the GANs generator to capture the data manifold well, we propose to estimate the tangent space to the data manifold using GANs and employ it to inject invariances into the classifier. In the process, we propose enhancements over existing methods for learning the inverse mapping (i.e., the encoder) which greatly improves in terms of semantic similarity of the reconstructed sample with the input sample. We observe considerable empirical gains in semi-supervised learning over baselines, particularly in the cases when the number of labeled examples is low. We also provide insights into how fake examples influence the semi-supervised learning procedure.
• The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Current state-of-the-art approaches for visual recognition do not exploit these rule-based knowledge sources. Instead, they learn recognition models directly from training examples. In this paper, we study how general-purpose ontologies---specifically, MIT's ConceptNet ontology---can improve the performance of state-of-the-art vision systems. As a testbed, we tackle the problem of sentence-based image retrieval. Our retrieval approach incorporates knowledge from ConceptNet on top of a large pool of object detectors derived from a deep learning technique. In our experiments, we show that ConceptNet can improve performance on a common benchmark dataset. Key to our performance is the use of the ESPGAME dataset to select visually relevant relations from ConceptNet. Consequently, a main conclusion of this work is that general-purpose commonsense ontologies improve performance on visual reasoning tasks when properly filtered to select meaningful visual relations.
• Syntactic parsing is a key task in natural language processing which has been dominated by symbolic, grammar-based syntactic parsers. Neural networks, with their distributed representations, are challenging these methods. In this paper, we want to show that existing parsing algorithms can cross the border and be defined over distributed representations. We then define D-CYK: a version of the traditional CYK algorithm defined over distributed representations. Our D-CYK operates as the original CYK but uses matrix multiplications. These operations are compatible with traditional neural networks. Experiments show that D-CYK approximates the original CYK. By showing that CYK can be performed on distributed representations, our D-CYK opens the possibility of defining recurrent layers of CYK-informed neural networks.
• We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.
• This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.
• The effectiveness of generative adversarial approaches in producing images according to a specific style or visual domain has recently opened new directions to solve the unsupervised domain adaptation problem. It has been shown that source labeled images can be modified to mimic target samples making it possible to train directly a classifier in the target domain, despite the original lack of annotated data. Inverse mappings from the target to the source domain have also been evaluated but only passing through adapted feature spaces, thus without new image generation. In this paper we propose to better exploit the potential of generative adversarial networks for adaptation by introducing a novel symmetric mapping among domains. We jointly optimize bi-directional image transformations combining them with target self-labeling. Moreover we define a new class consistency loss that aligns the generators in the two directions imposing to conserve the class identity of an image passing through both domain mappings. A detailed qualitative and quantitative analysis of the reconstructed images confirm the power of our approach. By integrating the two domain specific classifiers obtained with our bi-directional network we exceed previous state-of-the-art unsupervised adaptation results on four different benchmark datasets.
• Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers. The most important aspect of inferring causal effects from observational data is the handling of confounders, factors that affect both an intervention and its outcome. A carefully designed observational study attempts to measure all important confounders. However, even if one does not have direct access to all confounders, there may exist noisy and uncertain measurement of proxies for confounders. We build on recent advances in latent variable modelling to simultaneously estimate the unknown latent space summarizing the confounders and the causal effect. Our method is based on Variational Autoencoders (VAE) which follow the causal structure of inference with proxies. We show our method is significantly more robust than existing methods, and matches the state-of-the-art on previous benchmarks focused on individual treatment effects.
• The increasing complexity of the power grid, due to higher penetration of distributed resources and the growing availability of interconnected, distributed metering devices re- quires novel tools for providing a unified and consistent view of the system. A computational framework for power systems data fusion, based on probabilistic graphical models, capable of combining heterogeneous data sources with classical state estimation nodes and other customised computational nodes, is proposed. The framework allows flexible extension of the notion of grid state beyond the view of flows and injection in bus-branch models, and an efficient, naturally distributed inference algorithm can be derived. An application of the data fusion model to the quantification of distributed solar energy is proposed through numerical examples based on semi-synthetic simulations of the standard IEEE 14-bus test case.
• Advances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.
• Based on the progress of image recognition, video recognition has been extensively studied recently. However, most of the existing methods are focused on short-term but not long-term video recognition, called contextual video recognition. To address contextual video recognition, we use convolutional recurrent neural networks (ConvRNNs) having a rich spatio-temporal information processing capability, but ConvRNNs requires extensive computation that slows down training. In this paper, inspired by the normalization and detrending methods, we propose adaptive detrending (AD) for temporal normalization in order to accelerate the training of ConvRNNs, especially for convolutional gated recurrent unit (ConvGRU). AD removes internal covariate shift within a sequence of each neuron in recurrent neural networks (RNNs) by subtracting a trend. In the experiments for contextual recognition on ConvGRU, the results show that (1) ConvGRU clearly outperforms the feed-forward neural networks, (2) AD consistently offers a significant training acceleration and generalization improvement, and (3) AD is further improved by collaborating with the existing normalization methods.
• This paper describes a novel system that provides key parameters of HTTP Adaptive Streaming (HAS) sessions to the lower layers of the protocol stack. A non-intrusive traffic profiling solution is proposed that observes packet flows at the transmit queue of base stations, edge-routers, or gateways. By analyzing IP flows in real time, the presented scheme identifies different phases of an HAS session and estimates important application-layer parameters, such as play-back buffer state and video encoding rate. The introduced estimators only use IP-layer information, do not require standardization and work even with traffic that is encrypted via Transport Layer Security (TLS). Experimental results for a popular video streaming service clearly verify the high accuracy of the proposed solution. Traffic profiling, thus, provides a valuable alternative to cross-layer signaling and Deep Packet Inspection (DPI) in order to perform efficient network optimization for video streaming.
• In real-world classification tasks, it is difficult to collect samples of all possible categories of the environment in the training stage. Therefore, the classifier should be prepared for unseen classes. When an instance of an unseen class appears in the prediction stage, a robust classifier should have the ability to tell it is unseen, instead of classifying it to be any known category. In this paper, adopting the idea of adversarial learning, we propose the ASG framework for open-category classification. ASG generates positive and negative samples of seen categories in the unsupervised manner via an adversarial learning strategy. With the generated samples, ASG then learns to tell seen from unseen in the supervised manner. Experiments performed on several datasets show the effectiveness of ASG.
• Unsupervised structure learning in high-dimensional time series data has attracted a lot of research interests. For example, segmenting and labelling high dimensional time series can be helpful in behavior understanding and medical diagnosis. Recent advances in generative sequential modeling have suggested to combine recurrent neural networks with state space models (e.g., Hidden Markov Models). This combination can model not only the long term dependency in sequential data, but also the uncertainty included in the hidden states. Inheriting these advantages of stochastic neural sequential models, we propose a structured and stochastic sequential neural network, which models both the long-term dependencies via recurrent neural networks and the uncertainty in the segmentation and labels via discrete random variables. For accurate and efficient inference, we present a bi-directional inference network by reparamterizing the categorical segmentation and labels with the recent proposed Gumbel-Softmax approximation and resort to the Stochastic Gradient Variational Bayes. We evaluate the proposed model in a number of tasks, including speech modeling, automatic segmentation and labeling in behavior understanding, and sequential multi-objects recognition. Experimental results have demonstrated that our proposed model can achieve significant improvement over the state-of-the-art methods.
• Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of hippocampus as a short-term memory system in primate brain, we propose the Deep Generative Replay, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"). With only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. We test our methods in several sequential learning settings involving image classification tasks.
• In the ultra-strong coupling regime of a light-matter system, the ground state exhibits non-trivial entanglement between the atom and photons. For the purposes of exploring the measurement and control of this ground state, here we analyze the dynamics of such an ultra-strongly-coupled system interacting with a driven nonlinear resonator acting as a measurement apparatus. Interestingly, although the coupling between the atom and the nonlinear resonator is much smaller than the typical energy scales of the ultra-strongly-coupled system, we show that we can generate a strong correlation between the nonlinear resonator and the light-matter system. A subsequent coarse- grained measurement on the nonlinear resonator significantly affects the light-matter system, and the phase of the light changes depending on the measurement results. Also, we investigate the conditions for when the nonlinear resonator can be entangled with the ultra-strongly coupled system, which is the mechanism that allows us to project the ground state of the ultra-strongly coupled system into a non-energy eigenstate.
• May 25 2017 math.RT math.GR arXiv:1705.08685v1
This paper studies intersections of principal blocks of a finite group with respect to different primes. We first define the block graph of a finite group G, whose vertices are the prime divisors of |G| and there is an edge between two vertices p \ne q if and only if the principal p- and q-blocks of G have a nontrivial common complex irreducible character of G. Then we determine the block graphs of finite simple groups, which turn out to be complete except those of J_1 and J_4. Also, we determine exactly when the Steinberg character of a finite simple group of Lie type lies in a principal block. Based on the above investigation, we obtain a criterion for the p-solvability of a finite group which in particular asserts that a finite group whose block graph has no triangle containing 2 is solvable.
• BGP (Border Gateway Protocol) serves as the primary routing protocol for the Internet, enabling Autonomous Systems (individual network operators) to exchange network reachability information. Alongside significant on-going research and development efforts, there is a practical need to understand the nature of events that occur on the Internet. Network operators are acutely aware of security-related incidents such as 'Prefix Hijacking' as well as the impact of network instabilities that ripple through the Internet. Recent research focused on the study of BGP anomalies (both network/prefix instability and security-related incidents) has been based on the analysis of historical logs. Further analysis to understand the nature of these anomalous events is not always sufficient to be able to differentiate malicious activities, such as prefix- or sub-prefix- hijacking, from those events caused by inadvertent misconfigurations. In addition, such techniques are challenged by a lack of sufficient resources to store and process data feeds in real-time from multiple BGP Vantage Points (VPs). In this paper, we present a BGP Deep-analysis application developed using the PNDA (Platform for Network Data Analytics) 'Big-Data' platform. PNDA provides a highly scalable environment that enables the ingestion and processing of 'live' BGP feeds from many vantage points in a schema-agnostic manner. The Apache Spark-based application, in conjunction with PNDA's distributed processing capabilities, is able to perform high-level insights as well as near-to-real-time statistical analysis
• Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible. To understand this approximate invertibility phenomenon and how to leverage it more effectively, we focus on a theoretical explanation and develop a mathematical model of sparse signal recovery that is consistent with CNNs with random weights. We give an exact connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable re- construction results on real images. We also discuss gaps between our model assumptions and the CNN trained for classification in practical scenarios.
• Robot introspection, as opposed to anomaly detection typical in process monitoring, helps a robot understand what it is doing at all times. A robot should be able to identify its actions not only when failure or novelty occurs, but also as it executes any number of sub-tasks. As robots continue their quest of functioning in unstructured environments, it is imperative they understand what is it that they are actually doing to render them more robust. This work investigates the modeling ability of Bayesian nonparametric techniques on Markov Switching Process to learn complex dynamics typical in robot contact tasks. We study whether the Markov switching process, together with Bayesian priors can outperform the modeling ability of its counterparts: an HMM with Bayesian priors and without. The work was tested in a snap assembly task characterized by high elastic forces. The task consists of an insertion subtask with very complex dynamics. Our approach showed a stronger ability to generalize and was able to better model the subtask with complex dynamics in a computationally efficient way. The modeling technique is also used to learn a growing library of robot skills, one that when integrated with low-level control allows for robot online decision making.
• We study the problem of dictionary learning for signals that can be represented as polynomials or polynomial matrices, such as convolutive signals with time delays or acoustic impulse responses. Recently, we developed a method for polynomial dictionary learning based on the fact that a polynomial matrix can be expressed as a polynomial with matrix coefficients, where the coefficient of the polynomial at each time lag is a scalar matrix. However, a polynomial matrix can be also equally represented as a matrix with polynomial elements. In this paper, we develop an alternative method for learning a polynomial dictionary and a sparse representation method for polynomial signal reconstruction based on this model. The proposed methods can be used directly to operate on the polynomial matrix without having to access its coefficients matrices. We demonstrate the performance of the proposed method for acoustic impulse response modeling.
• Photonic interference is a key quantum resource for optical quantum computation, and in particular for so-called boson sampling machines. In interferometers with certain symmetries, genuine multiphoton quantum interference effectively suppresses certain sets of events, as in the original Hong-Ou-Mandel effect. Recently, it was shown that some classical and semi-classical models could be ruled out by identifying such suppressions in Fourier interferometers. Here we propose a suppression law suitable for random-input experiments in multimode Sylvester interferometers, and verify it experimentally using 4- and 8-mode integrated interferometers. The observed suppression is stronger than what is observed in Fourier interferometers of the same size, and could be relevant to certification of boson sampling machines and other experiments relying on bosonic interference.
• May 25 2017 cs.NE arXiv:1705.08639v1
Processing sequential data of variable length is a major challenge in a wide range of applications, such as speech recognition, language modeling, generative image modeling and machine translation. Here, we address this challenge by proposing a novel recurrent neural network (RNN) architecture, the Fast-Slow RNN (FS-RNN). The FS-RNN incorporates the strengths of both multiscale RNNs and deep transition RNNs as it processes sequential data on different timescales and learns complex transition functions from one time step to the next. We evaluate the FS-RNN on two character level language modeling data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of the art results to $1.19$ and $1.25$ bits-per-character (BPC), respectively. In addition, an ensemble of two FS-RNNs achieves $1.20$ BPC on Hutter Prize Wikipedia outperforming the best known compression algorithm with respect to the BPC measure. We also present an empirical investigation of the learning and network dynamics of the FS-RNN, which explains the improved performance compared to other RNN architectures. Our approach is general as any kind of RNN cell is a possible building block for the FS-RNN architecture, and thus can be flexibly applied to different tasks.
• End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of visual features by mining a large scale corpus of multi-modal (text and image) documents. We show that discriminative visual features can be learnt efficiently by training a CNN to predict the semantic context in which a particular image is more probable to appear as an illustration. For this we leverage the hidden semantic structures discovered in the text corpus with a well-known topic modeling technique. Our experiments demonstrate state of the art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or natural-supervised approaches.
• Even in the absence of clocks, time bounds on the duration of actions enable the use of time for distributed coordination. This paper initiates an investigation of coordination in such a setting. A new communication structure called a zigzag pattern is introduced, and shown to guarantee bounds on the relative timing of events in this clockless model. Indeed, zigzag patterns are shown to be necessary and sufficient for establishing that events occur in a manner that satisfies prescribed bounds. We capture when a process can know that an appropriate zigzag pattern exists, and use this to provide necessary and sufficient conditions for timed coordination of events using a full-information protocol in the clockless model.
• In this paper, we design a multimodal framework for object detection, recognition and mapping based on the fusion of stereo camera frames, point cloud Velodyne Lidar scans, and Vehicle-to-Vehicle (V2V) Basic Safety Messages (BSMs) exchanged using Dedicated Short Range Communication (DSRC). We merge the key features of rich texture descriptions of objects from 2D images, depth and distance between objects provided by 3D point cloud and awareness of hidden vehicles from BSMs' 3D information. We present a joint pixel to point cloud and pixel to V2V correspondences of objects in frames from the Kitti Vision Benchmark Suite by using a semi-supervised manifold alignment approach to achieve camera-Lidar and camera-V2V mapping of their recognized objects that have the same underlying manifold.
• May 25 2017 cs.CV arXiv:1705.08623v1
Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted to CNN to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network(DREN) consisting of cycle layers, isotonic layers and decycle layers.Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures. Our codes are released on GitHub.
• May 25 2017 stat.ML cs.LG arXiv:1705.08621v1
We consider the task of collaborative preference completion: given a pool of items, a pool of users and a partially observed item-user rating matrix, the goal is to recover the personalized ranking of each user over all of the items. Our approach is nonparametric: we assume that each item $i$ and each user $u$ have unobserved features $x_i$ and $y_u$, and that the associated rating is given by $g_u(f(x_i,y_u))$ where $f$ is Lipschitz and $g_u$ is a monotonic transformation that depends on the user. We propose a $k$-nearest neighbors-like algorithm and prove that it is consistent. To the best of our knowledge, this is the first consistency result for the collaborative preference completion problem in a nonparametric setting. Finally, we conduct experiments on the Netflix and Movielens datasets that suggest that our algorithm has some advantages over existing neighborhood-based methods and that its performance is comparable to some state-of-the art matrix factorization methods.
• Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets.
• Template matching by normalized cross correlation (NCC) is widely used for finding image correspondences. We improve the robustness of this algorithm by preprocessing images with "siamese" convolutional networks trained to maximize the contrast between NCC values of true and false matches. The improvement is quantified using patches of brain images from serial section electron microscopy. Relative to a parameter-tuned bandpass filter, siamese convolutional networks significantly reduce false matches. Furthermore, all false matches can be eliminated by removing a tiny fraction of all matches based on NCC values. The improved accuracy of our method could be essential for connectomics, because emerging petascale datasets may require billions of template matches to assemble 2D images of serial sections into a 3D image stack. Our method is also expected to generalize to many other computer vision applications that use NCC template matching to find image correspondences.
• Given large amount of real photos for training, Convolutional neural network shows excellent performance on object recognition tasks. However, the process of collecting data is so tedious and the background are also limited which makes it hard to establish a perfect database. In this paper, our generative model trained with synthetic images rendered from 3D models reduces the workload of data collection and limitation of conditions. Our structure is composed of two sub-networks: semantic foreground object reconstruction network based on Bayesian inference and classification network based on multi-triplet cost function for avoiding over-fitting problem on monotone surface and fully utilizing pose information by establishing sphere-like distribution of descriptors in each category which is helpful for recognition on regular photos according to poses, lighting condition, background and category information of rendered images. Firstly, our conjugate structure called generative model with metric learning utilizing additional foreground object channels generated from Bayesian rendering as the joint of two sub-networks. Multi-triplet cost function based on poses for object recognition are used for metric learning which makes it possible training a category classifier purely based on synthetic data. Secondly, we design a coordinate training strategy with the help of adaptive noises acting as corruption on input images to help both sub-networks benefit from each other and avoid inharmonious parameter tuning due to different convergence speed of two sub-networks. Our structure achieves the state of the art accuracy of over 50\% on ShapeNet database with data migration obstacle from synthetic images to real photos. This pipeline makes it applicable to do recognition on real images only based on 3D models.
• It is oftentimes impossible to understand how machine learning models reach a decision. While recent research has proposed various technical approaches to provide some clues as to how a learning model makes individual decisions, they cannot provide users with ability to inspect a learning model as a complete entity. In this work, we propose a new technical approach that augments a Bayesian regression mixture model with multiple elastic nets. Using the enhanced mixture model, we extract explanations for a target model through global approximation. To demonstrate the utility of our approach, we evaluate it on different learning models covering the tasks of text mining and image recognition. Our results indicate that the proposed approach not only outperforms the state-of-the-art technique in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of a learning model.
• We formulate the problem of supervised hashing, or learning binary embeddings of data, as a learning to rank problem. Specifically, we optimize two common ranking-based evaluation metrics, Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). Observing that ranking with the discrete Hamming distance naturally results in ties, we propose to use tie-aware versions of ranking metrics in both the evaluation and the learning of supervised hashing. For AP and NDCG, we derive continuous relaxations of their tie-aware versions, and optimize them using stochastic gradient ascent with deep neural networks. Our results establish the new state-of-the-art for tie-aware AP and NDCG on common hashing benchmarks.
• In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multi-label prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process "grounding"). The approach is particularly well-suited for extracting large numbers of concepts from text. We apply the new model to address an important problem in healthcare of understanding what medical concepts are discussed in clinical text. Using a publicly available dataset derived from Intensive Care Units, we learn to label a patient's diagnoses and procedures from their discharge summary. Our evaluation shows a clear advantage to using our proposed architecture over a variety of strong baselines.
• We prove Lieb-Robinson bounds for a general class of lattice fermion systems. By making use of a suitable conditional expectation onto subalgebras of the CAR algebra, we can apply the Lieb-Robinson bounds much in the same way as for quantum spin systems. We preview how to obtain the spectral flow automorphisms and to prove stability of the spectral gap for frustration-free gapped systems satisfying a Local Topological Quantum Order condition.
• Mammogram classification is directly related to computer-aided diagnosis of breast cancer. Traditional methods rely on regions of interest (ROIs) which require great efforts to annotate. Inspired by the success of using deep convolutional features for natural image analysis and multi-instance learning (MIL) for labeling a set of instances/patches, we propose end-to-end trained deep multi-instance networks for mass classification based on whole mammogram without the aforementioned ROIs. We explore three different schemes to construct deep multi-instance networks for whole mammogram classification. Experimental results on the INbreast dataset demonstrate the robustness of proposed networks compared to previous work using segmentation and detection annotations.
• In a world awash with data, the ability to think and compute with data has become an important skill for students in many fields. For that reason, inclusion of some level of statistical computing in many introductory-level courses has grown more common in recent years. Existing literature has documented multiple success stories of teaching statistics with R, bolstered by the capabilities of R Markdown. In this article, we present an in-class data visualization activity intended to expose students to R and R Markdown during the first week of an introductory statistics class. The activity begins with a brief lecture on exploratory data analysis in R. Students are then placed in small groups tasked with exploring a new dataset to produce three visualizations that describe particular insights that are not immediately obvious from the data. Upon completion, students will have produced a series of univariate and multivariate visualizations on a real dataset and practiced describing them.

Felix Leditzky May 24 2017 20:43 UTC

Yes, that's right, thanks!

For (5), you use the Cauchy-Schwarz inequality $\left| \operatorname{tr}(X^\dagger Y) \right| \leq \sqrt{\operatorname{tr}(X^\dagger X)} \sqrt{\operatorname{tr}(Y^\dagger Y)}$ for the Hilbert-Schmidt inner product $\langle X,Y\rangle := \operatorname{tr}(X^\dagger Y)$ wi

...(continued)
Michael Tolan May 24 2017 20:27 UTC

Just reading over Eq (5) on P5 concerning the diamond norm.

Should the last $\sigma_1$ on the 4th line be replaced with a $\sigma_2$? I think I can see how the proof is working but not entirely certain.

Noon van der Silk May 23 2017 11:15 UTC

I think this thread has reached it's end.

I've locked further comments, and I hope that the quantum computing community can thoughtfully find an approach to language that is inclusive to all and recognises the diverse background of all researchers, current and future.

...(continued)
Varun Narasimhachar May 23 2017 02:14 UTC

While I would never want to antagonize my peers or to allow myself to assume they were acting irrationally, I do share your concerns to an extent. I worry about the association of social justice and inclusivity with linguistic engineering, virtual lynching, censorship, etc. (the latter phenomena sta

...(continued)
Aram Harrow May 23 2017 01:30 UTC

I think you are just complaining about issues that arise from living with other people in the same society. If you disagree with their values, well, then some of them might have a negative opinion about you. If you express yourself in an aggressive way, and use words like "lynch" to mean having pe

...(continued)
Steve Flammia May 23 2017 01:04 UTC

I agree with Noon that the discussion is becoming largely off topic for SciRate, but that it might still be of interest to the community to discuss this. I invite people to post thoughtful and respectful comments over at [my earlier Quantum Pontiff post][1]. Further comments here on SciRate will be

...(continued)
Noon van der Silk May 23 2017 00:59 UTC

I've moderated a few comments on this post because I believe it has gone past useful discussion, and I'll continue to remove comments that I believe don't add anything of substantial value.

Thanks.

Aram Harrow May 22 2017 23:13 UTC

The problem with your argument is that no one is forcing anyone to say anything, or banning anything.

If the terms really were offensive or exclusionary or had other bad side effects, then it's reasonable to discuss as a community whether to keep them, and possibly decide to stop using them. Ther

...(continued)
stan May 22 2017 22:53 UTC

Fair enough. At the end of the day I think most of us are concerned with the strength of the result not the particular language used to describe it.

VeteranVandal May 22 2017 22:41 UTC

But how obvious is ancilla? To me it is not even remotely obvious (nor clear as a term, but as the literature used it so much, I see such word in much the same way as I see auxiliary, in fact - now if you want to take offense with auxiliary, what can I say? I won't invent words just to please you).

...(continued)