# Top arXiv papers

• Robust quantum computation requires encoding delicate quantum information into degrees of freedom that are hard for the environment to change. Quantum encodings have been demonstrated in many physical systems by observing and correcting storage errors, but applications require not just storing information; we must accurately compute even with faulty operations. The theory of fault-tolerant quantum computing illuminates a way forward by providing a foundation and collection of techniques for limiting the spread of errors. Here we implement one of the smallest quantum codes in a five-qubit superconducting transmon device and demonstrate fault-tolerant state preparation. We characterize the resulting codewords through quantum process tomography and study the free evolution of the logical observables. Our results are consistent with fault-tolerant state preparation in a protected qubit subspace.
• We show that Clifford operations on qubit stabilizer states are non-contextual and can be represented by non-negative quasi-probability distributions associated with a Wigner-Weyl-Moyal formalism. This is accomplished by generalizing the Wigner-Weyl-Moyal formalism to three generators instead of two---producing an exterior, or Grassmann, algebra---which results in Clifford group gates for qubits that act as a permutation on the finite Weyl phase space points naturally associated with stabilizer states. As a result, a non-negative probability distribution can be associated with each stabilizer state's three-generator Wigner function, and these distributions evolve deterministically to one other under Clifford gates. This corresponds to a hidden variable theory that is non-contextual and local for qubit Clifford operations. Equivalently, we show that qubit Clifford gates can be expressed as propagators within the three-generator Wigner-Weyl-Moyal formalism whose semiclassical expansion is truncated at order $\hbar^0$ with a finite number of terms. The $T$-gate, which extends the Clifford gate set to one capable of universal quantum computation, requires a semiclassical expansion of its propagator to order $\hbar^1$. We compare this approach to previous quasi-probability descriptions of qubits that relied on the two-generator Wigner-Weyl-Moyal formalism and find that the two-generator Weyl symbols of stabilizer states result in a description of evolution under Clifford gates that is state-dependent. We show that this two-generator description of stabilizer evolution is thus a non-local and contextual hidden variable theory---it is a contextual description of a non-contextual process. We have thus extended the established result that Clifford stabilizer operations are non-contextual and have non-negative quasi-probability distributions in the odd $d$-dimensional case to $d=2$ qubits.
• One of the most widely known building blocks of modern physics is Heisenberg's indeterminacy principle. Among the different statements of this fundamental property of the full quantum mechanical nature of physical reality, the uncertainty relation for energy and time has a special place. Its interpretation and its consequences have inspired continued research efforts for almost a century. In its modern formulation, the uncertainty relation is understood as setting a fundamental bound on how fast any quantum system can evolve. In this Topical Review we describe important milestones, such as the Mandelstam-Tamm and the Margolus-Levitin bounds on the quantum speed limit, and summarise recent applications in a variety of current research fields -- including quantum information theory, quantum computing, and quantum thermodynamics amongst several others. To bring order and to provide an access point into the many different notions and concepts, we have grouped the various approaches into the minimal time approach and the geometric approach, where the former relies on quantum control theory, and the latter arises from measuring the distinguishability of quantum states. Due to the volume of the literature, this Topical Review can only present a snapshot of the current state-of-the-art and can never be fully comprehensive. Therefore, we highlight but a few works hoping that our selection can serve as a representative starting point for the interested reader.
• Random quantum circuits yield minimally structured models for chaotic quantum dynamics, able to capture for example universal properties of entanglement growth. We provide exact results and coarse-grained models for the spreading of operators by quantum circuits made of Haar-random unitaries. We study both 1+1D and higher dimensions, and argue that the coarse-grained pictures carry over to operator spreading in generic many-body systems. In 1+1D, we demonstrate that the out-of-time-order correlator (OTOC) satisfies a biased diffusion equation, which gives exact results for the spatial profile of the OTOC, and the butterfly speed $v_{B}$. We find that in 1+1D the front' of the OTOC broadens diffusively, with a width scaling in time as $t^{1/2}$. We address fluctuations in the OTOC between different realizations of the random circuit, arguing that they are negligible in comparison to the broadening of the front. Turning to higher D, we show that the averaged OTOC can be understood exactly via a remarkable correspondence with a classical droplet growth problem. This implies that the width of the front of the averaged OTOC scales as $t^{1/3}$ in 2+1D and $t^{0.24}$ in 3+1D (KPZ exponents). We support our analytic argument with simulations in 2+1D. We point out that, in a lattice model, the late time shape of the spreading operator is in general not spherical. However when full spatial rotational symmetry is present in 2+1D, our mapping implies an exact asymptotic form for the OTOC in terms of the Tracy-Widom distribution. For an alternative perspective on the OTOC in 1+1D, we map it to the partition function of an Ising-like model. Thanks to special structure arising from unitarity, this partition function reduces to a random walk calculation which can be performed exactly. We also use this mapping to give exact results for entanglement growth in 1+1D circuits.
• This paper reports on an experiment realized on the IBM 5Q chip which demonstrates strong evidence for the advantage of using error detection and fault-tolerant design of quantum circuits. By showing that fault-tolerant quantum computation is already within our reach, the author hopes to encourage this approach.
• Grid states form a discrete set of mixed quantum states that can be described by graphs. We characterize the entanglement properties of these states and provide methods to evaluate entanglement criteria for grid states in a graphical way. With these ideas we find bound entangled grid states for two-particle systems of any dimension and multiparticle grid states that provide examples for the different aspects of genuine multiparticle entanglement. Our findings suggest that entanglement theory for grid states, although being a discrete set, has already a complexity similar to the one for general states.
• An important class of contextuality arguments in quantum foundations are the All-versus-Nothing (AvN) proofs, generalising a construction originally due to Mermin. We present a general formulation of All-versus-Nothing arguments, and a complete characterisation of all such arguments which arise from stabiliser states. We show that every AvN argument for an n-qubit stabiliser state can be reduced to an AvN proof for a three-qubit state which is local Clifford-equivalent to the tripartite GHZ state. This is achieved through a combinatorial characterisation of AvN arguments, the AvN triple Theorem, whose proof makes use of the theory of graph states. This result enables the development of a computational method to generate all the AvN arguments in $\mathbb{Z}_2$ on n-qubit stabiliser states. We also present new insights into the stabiliser formalism and its connections with logic.
• We consider the contextual fraction as a quantitative measure of contextuality of empirical models, i.e. tables of probabilities of measurement outcomes in an experimental scenario. It provides a general way to compare the degree of contextuality across measurement scenarios; it bears a precise relationship to violations of Bell inequalities; its value, and a witnessing inequality, can be computed using linear programming; it is monotone with respect to the "free" operations of a resource theory for contextuality; and it measures quantifiable advantages in informatic tasks, such as games and a form of measurement based quantum computing.
• We introduce a framework for graphical security proofs in device-independent quantum cryptography using the methods of categorical quantum mechanics. We are optimistic that this approach will make some of the highly complex proofs in quantum cryptography more accessible, facilitate the discovery of new proofs, and enable automated proof verification. As an example of our framework, we reprove a recent result from device-independent quantum cryptography: any linear randomness expansion protocol can be converted into an unbounded randomness expansion protocol. We give a graphical exposition of a proof of this result and implement parts of it in the Globular proof assistant.
• Communication over a noisy channel is often conducted in a setting in which different input symbols to the channel incur a certain cost. For example, for the additive white Gaussian noise channel, the cost associated with a real number input symbol is the square of its magnitude. In such a setting, it is often useful to know the maximum amount of information that can be reliably transmitted per cost incurred. This is known as the capacity per unit cost. In this paper, we generalize the capacity per unit cost to various communication tasks involving a quantum channel; in particular, we consider classical communication, entanglement-assisted classical communication, private communication, and quantum communication. For each task, we define the corresponding capacity per unit cost and derive a formula for it via the expression for the capacity per channel use. Furthermore, for the special case in which there is a zero-cost quantum state, we obtain expressions for the various capacities per unit cost in terms of an optimized relative entropy involving the zero-cost state. For each communication task, we construct an explicit pulse-position-modulation coding scheme that achieves the capacity per unit cost. Finally, we compute capacities per unit cost for various quantum Gaussian channels.
• May 24 2017 quant-ph arXiv:1705.07911v1
Contextuality is a fundamental feature of quantum theory and is necessary for quantum computation and communication. Serious steps have therefore been taken towards a formal framework for contextuality as an operational resource. However, the most important component for a resource theory - a concrete, explicit form for the free operations of contextuality - was still missing. Here we provide such a component by introducing noncontextual wirings: a physically-motivated class of contextuality-free operations with a friendly parametrization. We characterize them completely for the general case of black-box measurement devices with arbitrarily many inputs and outputs. As applications, we show that the relative entropy of contextuality is a contextuality monotone and that maximally contextual boxes that serve as contextuality bits exist for a broad class of scenarios. Our results complete a unified resource-theoretic framework for contextuality and Bell nonlocality.
• In this paper we improve the layered implementation of arbitrary stabilizer circuits introduced by Aaronson and Gottesman in Phys. Rev. A 70(052328), 2004: to implement a general stabilizer circuit, we reduce their 11-stage computation -H-C-P-C-P-C-H-P-C-P-C- over the gate set consisting of Hadamard, Controlled-NOT, and Phase gates, into a 7-stage computation of the form -C-CZ-P-H-P-CZ-C-. We show arguments in support of using -CZ- stages over the -C- stages: not only the use of -CZ- stages allows a shorter layered expression, but -CZ- stages are simpler and appear to be easier to implement compared to the -C- stages. Based on this decomposition, we develop a two-qubit gate depth-$(14n{-}4)$ implementation of stabilizer circuits over the gate library H, P, CNOT, executable in the LNN architecture, improving best previously known depth-$25n$ circuit, also executable in the LNN architecture. Our constructions rely on Bruhat decomposition of the symplectic group and on folding arbitrarily long sequences of the form $($-P-C-$)^m$ into a 3-stage computation -P-CZ-C-. Our results include the reduction of the $11$-stage decomposition -H-C-P-C-P-C-H-P-C-P-C- into a $9$-stage decomposition of the form -C-P-C-P-H-C-P-C-P-. This reduction is based on the Bruhat decomposition of the symplectic group. This result also implies a new normal form for stabilizer circuits. We show that a circuit in this normal form is optimal in the number of Hadamard gates used. We also show that the normal form has an asymptotically optimal number of parameters.
• To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Approximate Optimization Algorithm (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. compile circuits of various sizes to a realistic hardware. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.
• May 26 2017 quant-ph arXiv:1705.09212v1
Schroedinger's equation says that the Hamiltonian is the generator of time translations. This seems to imply that any reasonable definition of time operator must be conjugate to the Hamiltonian. Then both time and energy must have the same spectrum since conjugate operators are unitarily equivalent. Clearly this is not always true: normal Hamiltonians have lower bounded spectrum and often only have discrete eigenvalues, whereas we typically desire that time can take any real value. Pauli concluded that constructing a general a time operator is impossible (although clearly it can be done in specific cases). Here we show how the Pauli argument fails when one uses an external system (a "clock") to track time, so that time arises as correlations between the system and the clock (conditional probability amplitudes framework). In this case, the time operator is not conjugate to the system Hamiltonian, but its eigenvalues still satisfy the Schroedinger equation for arbitrary Hamiltonians.
• The discovery of topological states of matter has profoundly augmented our understanding of phase transitions in physical systems. Instead of local order parameters, topological phases are described by global topological invariants and are therefore robust against perturbations. A prominent example thereof is the two-dimensional integer quantum Hall effect. It is characterized by the first Chern number which manifests in the quantized Hall response induced by an external electric field. Generalizing the quantum Hall effect to four-dimensional systems leads to the appearance of a novel non-linear Hall response that is quantized as well, but described by a 4D topological invariant - the second Chern number. Here, we report on the first observation of a bulk response with intrinsic 4D topology and the measurement of the associated second Chern number. By implementing a 2D topological charge pump with ultracold bosonic atoms in an angled optical superlattice, we realize a dynamical version of the 4D integer quantum Hall effect. Using a small atom cloud as a local probe, we fully characterize the non-linear response of the system by in-situ imaging and site-resolved band mapping. Our findings pave the way to experimentally probe higher-dimensional quantum Hall systems, where new topological phases with exotic excitations are predicted.
• One of the most striking features of quantum theory is the existence of entangled states, responsible for Einstein's so called "spooky action at a distance". These states emerge from the mathematical formalism of quantum theory, but to date we do not have a clear idea of the physical principles that give rise to entanglement. Why does nature have entangled states? Would any theory superseding classical theory have entangled states, or is quantum theory special? One important feature of quantum theory is that it has a classical limit, recovering classical theory through the process of decoherence. We show that any theory with a classical limit must contain entangled states, thus establishing entanglement as an inevitable feature of any theory superseding classical theory.
• The aim of this contribution is to discuss relations between non-classical features, such as entanglement, incompatibility of measurements, steering and non-locality, in general probabilistic theories. We show that all these features are particular forms of entanglement, which leads to close relations between their quantifications. For this, we study the structure of the tensor products of a compact convex set with a semiclassical state space.
• Machine learning and data analysis now finds both scientific and industrial application in biology, chemistry, geology, medicine, and physics. These applications rely on large quantities of data gathered from automated sensors and user input. Furthermore, the dimensionality of many datasets is extreme: more details are being gathered about single user interactions or sensor readings. All of these applications encounter problems with a common theme: use observed data to make inferences about the world. Our work obtains the first provably efficient algorithms for Independent Component Analysis (ICA) in the presence of heavy-tailed data. The main tool in this result is the centroid body (a well-known topic in convex geometry), along with optimization and random walks for sampling from a convex body. This is the first algorithmic use of the centroid body and it is of independent theoretical interest, since it effectively replaces the estimation of covariance from samples, and is more generally accessible. This reduction relies on a non-linear transformation of samples from such an intersection of halfspaces (i.e. a simplex) to samples which are approximately from a linearly transformed product distribution. Through this transformation of samples, which can be done efficiently, one can then use an ICA algorithm to recover the vertices of the intersection of halfspaces. Finally, we again use ICA as an algorithmic primitive to construct an efficient solution to the widely-studied problem of learning the parameters of a Gaussian mixture model. Our algorithm again transforms samples from a Gaussian mixture model into samples which fit into the ICA model and, when processed by an ICA algorithm, result in recovery of the mixture parameters. Our algorithm is effective even when the number of Gaussians in the mixture grows polynomially with the ambient dimension
• Photonic platforms represent a promising technology for the realization of several quantum communication protocols and for experiments of quantum simulation. Moreover, large-scale integrated interferometers have recently gained a relevant role for restricted models of quantum computing, specifically with Boson Sampling devices. Indeed, various linear optical schemes have been proposed for the implementation of unitary transformations, each one suitable for a specific task. Notwithstanding, so far a comprehensive analysis of the state of the art under broader and realistic conditions is still lacking. In the present work we address this gap, providing in a unified framework a quantitative comparison of the three main photonic architectures, namely the ones with triangular and square designs and the so-called fast transformations. All layouts have been analyzed in presence of losses and imperfect control over the reflectivities and phases of the inner structure. Our results represent a further step ahead towards the implementation of quantum information protocols on large-scale integrated photonic devices.
• When a two-dimensional electron gas is exposed to a perpendicular magnetic field and an in-plane electric field, its conductance becomes quantized in the transverse in-plane direction: this is known as the quantum Hall (QH) effect. This effect is a result of the nontrivial topology of the system's electronic band structure, where an integer topological invariant known as the first Chern number leads to the quantization of the Hall conductance. Interestingly, it was shown that the QH effect can be generalized mathematically to four spatial dimensions (4D), but this effect has never been realized for the obvious reason that experimental systems are bound to three spatial dimensions. In this work, we harness the high tunability and control offered by photonic waveguide arrays to experimentally realize a dynamically-generated 4D QH system using a 2D array of coupled optical waveguides. The inter-waveguide separation is constructed such that the propagation of light along the device samples over higher-dimensional momenta in the directions orthogonal to the two physical dimensions, thus realizing a 2D topological pump. As a result, the device's band structure is associated with 4D topological invariants known as second Chern numbers which support a quantized bulk Hall response with a 4D symmetry. In a finite-sized system, the 4D topological bulk response is carried by localized edges modes that cross the sample as a function of of the modulated auxiliary momenta. We directly observe this crossing through photon pumping from edge-to-edge and corner-to-corner of our system. These are equivalent to the pumping of charge across a 4D system from one 3D hypersurface to the opposite one and from one 2D hyperedge to another, and serve as first experimental realization of higher-dimensional topological physics.
• The device-independent approach to physics is one where conclusions are drawn directly and solely from the observed correlations between measurement outcomes. In quantum information, this approach allows one to make strong statements about the properties of the underlying devices via the observation of Bell-inequality-violating correlations. However, since one can only perform a finite number of experimental trials, an important gap remains between the many theoretical tools developed for such purposes and the experimentally obtained raw data. In particular, a sensible way to estimate the underlying quantum distribution has so far remained elusive. Here, we propose a few methods to bridge this gap. Under the assumption that the experimental trials are independent and identically distributed, our methods allow one to achieve the analog of quantum state estimation in the device-independent setting by generating unique point estimates of the true quantum distribution. In addition, we provide strong numerical evidence showing that these estimates converge toward the underlying distribution with a rate at least as good as the relative frequencies. We further demonstrate how these estimates of the true quantum distribution can be used to provide sensible estimates of certain desired properties of the system, such as the amount of entanglement present in the measured system.
• We consider a thermodynamic system in which the working fluid is a quantized harmonic oscillator subjected to periodic squeezing operations at a rate much larger than its resonance frequency. When the oscillator--bath coupling is treated with the conventional rotating wave approximation (RWA) we find that the system can behave as a heat pump. Without the RWA a much richer picture emerges, including refrigeration and heat engine behaviors, and a new method of parametric cooling of the oscillator. This shows the emergence of quantum thermodynamical phenomena beyond those permitted in the RWA.
• Thermalization and scrambling are the subject of much recent study from the perspective of many-body quantum systems with locally bounded Hilbert spaces (spin chains'), quantum field theory and holography. We tackle this problem in 1D spin-chains evolving under random local unitary circuits and prove a number of exact results on the behavior of out-of-time-ordered commutators (OTOCs), and entanglement growth in this setting. These results follow from the observation that the spreading of operators in random circuits is described by a hydrodynamical' equation of motion, despite the fact that random unitary circuits do not have locally conserved quantities (e.g., no conserved energy). In this hydrodynamic picture quantum information travels in a front with a butterfly velocity' $v_{\text{B}}$ that is smaller than the light cone velocity of the system, while the front itself broadens diffusively in time. The OTOC increases sharply after the arrival of the light cone, but we do \emphnot observe a prolonged exponential regime of the form $\sim e^{\lambda_\text{L}(t-x/v)}$ for a fixed Lyapunov exponent $\lambda_\text{L}$. We find that the diffusive broadening of the front has important consequences for entanglement growth, leading to an entanglement velocity that can be significantly smaller than the butterfly velocity. We conjecture that the hydrodynamical description applies to more generic ergodic systems and support this by verifying numerically that the diffusive broadening of the operator wavefront also holds in a more traditional non-random Floquet spin-chain. We also compare our results to Clifford circuits, which have less rich hydrodynamics and consequently trivial OTOC behavior, but which can nevertheless exhibit linear entanglement growth and thermalization.
• We demonstrate a synchronized readout (SR) technique for spectrally selective detection of oscillating magnetic fields with sub-millihertz resolution, using coherent manipulation of solid state spins. The SR technique is implemented in a sensitive magnetometer (~50 picotesla/Hz^(1/2)) based on nitrogen vacancy (NV) centers in diamond, and used to detect nuclear magnetic resonance (NMR) signals from liquid-state samples. We obtain NMR spectral resolution ~3 Hz, which is nearly two orders of magnitude narrower than previously demonstrated with NV based techniques, using a sample volume of ~1 picoliter. This is the first application of NV-detected NMR to sense Boltzmann-polarized nuclear spin magnetization, and the first to observe chemical shifts and J-couplings.
• Photonic interference is a key quantum resource for optical quantum computation, and in particular for so-called boson sampling machines. In interferometers with certain symmetries, genuine multiphoton quantum interference effectively suppresses certain sets of events, as in the original Hong-Ou-Mandel effect. Recently, it was shown that some classical and semi-classical models could be ruled out by identifying such suppressions in Fourier interferometers. Here we propose a suppression law suitable for random-input experiments in multimode Sylvester interferometers, and verify it experimentally using 4- and 8-mode integrated interferometers. The observed suppression is stronger than what is observed in Fourier interferometers of the same size, and could be relevant to certification of boson sampling machines and other experiments relying on bosonic interference.
• In this paper, we study convergence properties of the gradient Expectation-Maximization algorithm \citelange1995gradient for Gaussian Mixture Models for general number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers and dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two equal mixture symmetric GMM, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.
• This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 64k movie clips with actions localized in space and time, resulting in 197k action labels with multiple labels per human occurring frequently. The main differences with existing video datasets are: (1) the definition of atomic visual actions, which avoids collecting data for each and every complex action; (2) precise spatio-temporal annotations with possibly multiple annotations for each human; (3) the use of diverse, realistic video material (movies). This departs from existing datasets for spatio-temporal action recognition, such as JHMDB and UCF datasets, which provide annotations for at most 24 composite actions, such as basketball dunk, captured in specific environments, i.e., basketball court. We implement a state-of-the-art approach for action localization. Despite this, the performance on our dataset remains low and underscores the need for developing new approaches for video understanding. The AVA dataset is the first step in this direction, and enables the measurement of performance and progress in realistic scenarios.
• Even though the evolution of an isolated quantum system is unitary, the complexity of interacting many-body systems prevents the observation of recurrences of quantum states for all but the smallest systems. For large systems one can not access the full complexity of the quantum states and the requirements to observe a recurrence in experiments reduces to being close to the initial state with respect to the employed observable. Selecting an observable connected to the collective excitations in one-dimensional superfluids, we demonstrate recurrences of coherence and long range order in an interacting quantum many-body system containing thousands of particles. This opens up a new window into the dynamics of large quantum systems even after they reached a transient thermal-like state.
• What are the conditions for adiabatic quantum computation (AQC) to outperform classical computation? We consider the strong quantum speedup: scaling advantage in computational time over the best classical algorithms. Although there exist several quantum adiabatic algorithms achieving the strong quantum speedup, the essential keys to their speedups are still unclear. Here, we propose a necessary condition for the quantum speedup in AQC. This is a conjecture that a superposition of macroscopically distinct states appears during AQC if it achieves the quantum speedup. This is a natural extension of the conjecture in circuit-based quantum computation [A. Shimizu et al., J. Phys. Soc. Jpn. 82, 054801 (2013)]. To describe the statement of the conjecture, we introduce an index $p$ that quantifies a superposition of macroscopically distinct states---macroscopic entanglement---from the asymptotic behaviors of fluctuations of additive observables. We theoretically test the conjecture by investigating five quantum adiabatic algorithms. All the results show that the conjecture is correct for these algorithms. We therefore expect that a superposition of macroscopically distinct states is an appropriate indicator of entanglement crucial to the strong quantum speedup in AQC.
• We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$. We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
• The evidence lower bound (ELBO) appears in many algorithms for maximum likelihood estimation (MLE) with latent variables because it is a sharp lower bound of the marginal log-likelihood. For neural latent variable models, optimizing the ELBO jointly in the variational posterior and model parameters produces state-of-the-art results. Inspired by the success of the ELBO as a surrogate MLE objective, we consider the extension of the ELBO to a family of lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We show that the tightness of such bounds is asymptotically related to the variance of the underlying estimator. We introduce a special case, the filtering variational objectives (FIVOs), which takes the same arguments as the ELBO and passes them through a particle filter to form a tighter bound. FIVOs can be optimized tractably with stochastic gradients, and are particularly suited to MLE in sequential latent variable models. In standard sequential generative modeling tasks we present uniform improvements over models trained with ELBO, including some whole nat-per-timestep improvements.
• Some reflections are presented on the state of the search for a quantum theory of gravity. I discuss diverse regimes of possible quantum gravitational phenomenon, some well explored, some novel.
• We develop a no-go theorem for two-dimensional bosonic systems with crystal symmetries: if there is a half-integer spin at a rotation center, where the point-group symmetry is $\mathbb D_{2,4,6}$, such a system must have a ground-state degeneracy protected by the crystal symmetry. Such a degeneracy indicates either a broken-symmetry state or a unconventional state of matter. Comparing to the Lieb-Schultz-Mattis Theorem, our result counts the spin at each rotation center, instead of the total spin per unit cell, and therefore also applies to certain systems with an even number of half-integer spins per unit cell.
• Triangle-free graphs play a central role in graph theory, and triangle detection (or triangle finding) as well as triangle enumeration (triangle listing) play central roles in the field of graph algorithms. In distributed computing, algorithms with sublinear round complexity for triangle finding and listing have recently been developed in the powerful CONGEST clique model, where communication is allowed between any two nodes of the network. In this paper we present the first algorithms with sublinear complexity for triangle finding and triangle listing in the standard CONGEST model, where the communication topology is the same as the topology of the network. More precisely, we give randomized algorithms for triangle finding and listing with round complexity $O(n^{2/3}(\log n)^{2/3})$ and $O(n^{3/4}\log n)$, respectively, where $n$ denotes the number of nodes of the network. We also show a lower bound $\Omega(n^{1/3}/\log n)$ on the round complexity of triangle listing, which also holds for the CONGEST clique model.
• Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of pixel-level masks, which involves a large amount of human labour and time for annotation. In contrast, image-level labels are much easier to obtain. In this work, we propose a novel method for weakly supervised semantic segmentation with only image-level labels. The method relies on a large scale co-segmentation framework that can produce object masks for a group of images containing objects belonging to the same semantic class. We first retrieve images from search engines, e.g. Flickr and Google, using semantic class names as queries, e.g. class names in PASCAL VOC 2012. We then use high quality masks produced by co-segmentation on the retrieved images as well as the target dataset images with image level labels to train segmentation networks. We obtain IoU 56.9 on test set of PASCAL VOC 2012, which reaches state of the art performance.
• The design of neural architectures for structured objects is typically guided by experimental insights rather than a formal process. In this work, we appeal to kernels over combinatorial structures, such as sequences and graphs, to derive appropriate neural operations. We introduce a class of deep recurrent neural operations and formally characterize their associated kernel spaces. Our recurrent modules compare the input to virtual reference objects (cf. filters in CNN) via the kernels. Similar to traditional neural operations, these reference objects are parameterized and directly optimized in end-to-end training. We empirically evaluate the proposed class of neural architectures on standard applications such as language modeling and molecular graph regression, achieving state-of-the-art or competitive results across these applications. We also draw connections to existing architectures such as LSTMs.
• This paper presents two unsupervised learning layers (UL layers) for label-free video analysis: one for fully connected layers, and the other for convolutional ones. The proposed UL layers can play two roles: they can be the cost function layer for providing global training signal; meanwhile they can be added to any regular neural network layers for providing local training signals and combined with the training signals backpropagated from upper layers for extracting both slow and fast changing features at layers of different depths. Therefore, the UL layers can be used in either pure unsupervised or semi-supervised settings. Both a closed-form solution and an online learning algorithm for two UL layers are provided. Experiments with unlabeled synthetic and real-world videos demonstrated that the neural networks equipped with UL layers and trained with the proposed online learning algorithm can extract shape and motion information from video sequences of moving objects. The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization.
• May 25 2017 stat.ML cs.LG arXiv:1705.08826v1
In this work, we introduce the average top-$k$ (AT$_k$) loss as a new ensemble loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the AT$_k$ loss is a natural generalization of the two widely used ensemble losses, namely the average loss and the maximum loss, but can combines their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradient-based method. We provide an intuitive interpretation of the AT$_k$ loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of MAT$_k$ learning on the classification calibration of the AT$_k$ loss and the error bounds of AT$_k$-SVM. We demonstrate the applicability of minimum average top-$k$ learning for binary classification and regression using synthetic and real datasets.
• Characterization and certification of nonlocal correlations is one of the the central topics in quantum information theory. In this work, we develop the detection methods of entanglement and steering based on the universal uncertainty relations and fine-grained uncertainty relations. In the course of our study, the uncertainty relations are formulated in majorization form, and the uncertainty quantifier can be chosen as any convex Schur concave functions, this leads to a large set of inequalities, including all existing criteria based on entropies. We address the question that if all steerable states (or entangled states) can be witnessed by some uncertainty-based inequality, we find that for pure states and many important families of states, this is the case.
• In the ultra-strong coupling regime of a light-matter system, the ground state exhibits non-trivial entanglement between the atom and photons. For the purposes of exploring the measurement and control of this ground state, here we analyze the dynamics of such an ultra-strongly-coupled system interacting with a driven nonlinear resonator acting as a measurement apparatus. Interestingly, although the coupling between the atom and the nonlinear resonator is much smaller than the typical energy scales of the ultra-strongly-coupled system, we show that we can generate a strong correlation between the nonlinear resonator and the light-matter system. A subsequent coarse- grained measurement on the nonlinear resonator significantly affects the light-matter system, and the phase of the light changes depending on the measurement results. Also, we investigate the conditions for when the nonlinear resonator can be entangled with the ultra-strongly coupled system, which is the mechanism that allows us to project the ground state of the ultra-strongly coupled system into a non-energy eigenstate.
• Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by taking a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.
• Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD-GAN. The new distance measure in MMD-GAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.
• Large-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular [Rahimi and Recht, 2007], where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration [Yang et al., 2014]. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from "Stein effect", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for large-scale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.
• Solving sequential decision making problems, such as text parsing, robotic control, and game playing, requires a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration, a novel algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. In contrast, standard Deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that our method substantially outperforms Policy Gradients in the board game Hex, winning 84.4% of games against it when trained for equal time.
• Today, the internet makes tremendous amounts of data widely available. Often, the same information is behind multiple different available data sets. This lends growing importance to latent variable models that try to learn the hidden information from the available imperfect versions. For example, social media platforms can contain an abundance of pictures of the same person or object, yet all of which are taken from different perspectives. In a simplified scenario, one may consider pictures taken from the same perspective, which are distorted by noise. This latter application allows for a rigorous mathematical treatment, which is the content of this contribution. We apply a recently developed method of dependent component analysis to image denoising when multiple distorted copies of one and the same image are available, each being corrupted by a different and unknown noise process. In a simplified scenario, we assume that the distorted image is corrupted by noise that acts independently on each pixel. We answer completely the question of how to perform optimal denoising, when at least three distorted copies are available: First we define optimality of an algorithm in the presented scenario, and then we describe an aymptotically optimal universal discrete denoising algorithm (UDDA). In the case of binary data and binary symmetric noise, we develop a simplified variant of the algorithm, dubbed BUDDA, which we prove to attain universal denoising uniformly.
• We give a streamlined derivation of "teleportation identities" for discrete and continuous systems. The identities do not depend on the choice of Bell basis and so are "coordinate free". The unitaries Bob needs to apply to recover Alice's unknown state is the product of the unitaries Alice and Bob use to generate a common Bell basis. The case of qubit, qudits and continuous systems are all treated on the same footing.
• Recently, Roger Colbeck and Renato Renner (C&R) have claimed that '[n]o extension of quantum theory can have improved predictive power'. If correct, this is a spectacular impossibility theorem for hidden variable theories, which is more general than the theorems of Bell and Leggett. C&R's claim essentially means that in any hidden variable theory that is compatible with quantum-mechanical predictions, probabilities of measurement outcomes are independent of these hidden variables. On closer inspection, however, the generality and validity of the claim can be contested. First, it is based on an assumption called 'Freedom of Choice'. As the name suggests, this assumption involves the independence of an experimenter's choice of measurement settings. But in the way C&R define this assumption, a no-signalling condition is surreptitiously presupposed, making the assumption less innocent than it sounds. When using this definition, any hidden variable theory violating Parameter Independence, such as Bohmian Mechanics, is immediately shown to be incompatible with quantum-mechanical predictions. Also, the argument of C&R is hard to follow and their mathematical derivation contains several gaps, some of which cannot be closed in the way they suggest. We shall show that these gaps can be filled. The issue with the 'Freedom of Choice' assumption can be circumvented by explicitly assuming Parameter Independence. This makes the result less general, but better founded. We then obtain an impossibility theorem for hidden variable theories satisfying Parameter Independence only. So, while quantum mechanics itself satisfies Parameter Independence, if a variable is added that changes the outcome probabilities, however slightly, Parameter Independence must be violated.
• May 24 2017 cs.CV arXiv:1705.08041v1
A broad class of problems at the core of computational imaging, sensing, and low-level computer vision reduces to the inverse problem of extracting latent images that follow a prior distribution, from measurements taken under a known physical image formation model. Traditionally, hand-crafted priors along with iterative optimization methods have been used to solve such problems. In this paper we present unrolled optimization with deep priors, a principled framework for infusing knowledge of the image formation into deep networks that solve inverse problems in imaging, inspired by classical iterative methods. We show that instances of the framework outperform the state-of-the-art by a substantial margin for a wide variety of imaging problems, such as denoising, deblurring, and compressed sensing magnetic resonance imaging (MRI). Moreover, we conduct experiments that explain how the framework is best used and why it outperforms previous methods.
• There is a pressing need to build an architecture that could subsume these networks undera unified framework that achieves both higher performance and less overhead. To this end, two fundamental issues are yet to be addressed. The first one is how to implement the back propagation when neuronal activations are discrete. The second one is how to remove the full-precision hidden weights in the training phase to break the bottlenecks of memory/computation consumption. To address the first issue, we present a multistep neuronal activation discretization method and a derivative approximation technique that enable the implementing the back propagation algorithm on discrete DNNs. While for the second issue, we propose a discrete state transition (DST) methodology to constrain the weights in a discrete space without saving the hidden weights. In this way, we build a unified framework that subsumes the binary or ternary networks as its special cases.More particularly, we find that when both the weights and activations become ternary values, the DNNs can be reduced to gated XNOR networks (or sparse binary networks) since only the event of non-zero weight and non-zero activation enables the control gate to start the XNOR logic operations in the original binary networks. This promises the event-driven hardware design for efficient mobile intelligence. We achieve advanced performance compared with state-of-the-art algorithms. Furthermore,the computational sparsity and the number of states in the discrete space can be flexibly modified to make it suitable for various hardware platforms.
• Randomized binary exponential backoff (BEB) is a popular algorithm for coordinating access to a shared channel. With an operational history exceeding four decades, BEB is currently an important component of several wireless standards. Despite this track record, prior theoretical results indicate that under bursty traffic (1) BEB yields poor makespan and (2) superior algorithms are possible. To date, the degree to which these findings manifest in practice has not been resolved. To address this issue, we examine one of the strongest cases against BEB: $n$ packets that simultaneously begin contending for the wireless channel. Using Network Simulator 3, we compare against more recent algorithms that are inspired by BEB, but whose makespan guarantees are superior. Surprisingly, we discover that these newer algorithms significantly underperform. Through further investigation, we identify as the culprit a flawed but common abstraction regarding the cost of collisions. Our experimental results are complemented by analytical arguments that the number of collisions -- and not solely makespan -- is an important metric to optimize. We believe that these findings have implications for the design of contention-resolution algorithms.

Michael Tolan May 26 2017 18:41 UTC

That's what I was thinking, thanks!

Eddie Smolansky May 26 2017 05:23 UTC

Updated summary [here](https://github.com/eddiesmo/papers).

# How they made the dataset
- automated filtering with yolo and landmark detection projects
- crowd source final filtering (AMT - give 50 face images to turks and ask which don't belong)
- quality control through s

...(continued)
Felix Leditzky May 24 2017 20:43 UTC

Yes, that's right, thanks!

For (5), you use the Cauchy-Schwarz inequality $\left| \operatorname{tr}(X^\dagger Y) \right| \leq \sqrt{\operatorname{tr}(X^\dagger X)} \sqrt{\operatorname{tr}(Y^\dagger Y)}$ for the Hilbert-Schmidt inner product $\langle X,Y\rangle := \operatorname{tr}(X^\dagger Y)$ wi

...(continued)
Michael Tolan May 24 2017 20:27 UTC

Just reading over Eq (5) on P5 concerning the diamond norm.

Should the last $\sigma_1$ on the 4th line be replaced with a $\sigma_2$? I think I can see how the proof is working but not entirely certain.

Noon van der Silk May 23 2017 11:15 UTC

I think this thread has reached it's end.

I've locked further comments, and I hope that the quantum computing community can thoughtfully find an approach to language that is inclusive to all and recognises the diverse background of all researchers, current and future.

...(continued)
Varun Narasimhachar May 23 2017 02:14 UTC

While I would never want to antagonize my peers or to allow myself to assume they were acting irrationally, I do share your concerns to an extent. I worry about the association of social justice and inclusivity with linguistic engineering, virtual lynching, censorship, etc. (the latter phenomena sta

...(continued)
Aram Harrow May 23 2017 01:30 UTC

I think you are just complaining about issues that arise from living with other people in the same society. If you disagree with their values, well, then some of them might have a negative opinion about you. If you express yourself in an aggressive way, and use words like "lynch" to mean having pe

...(continued)
Steve Flammia May 23 2017 01:04 UTC

I agree with Noon that the discussion is becoming largely off topic for SciRate, but that it might still be of interest to the community to discuss this. I invite people to post thoughtful and respectful comments over at [my earlier Quantum Pontiff post][1]. Further comments here on SciRate will be

...(continued)
Noon van der Silk May 23 2017 00:59 UTC

I've moderated a few comments on this post because I believe it has gone past useful discussion, and I'll continue to remove comments that I believe don't add anything of substantial value.

Thanks.

Aram Harrow May 22 2017 23:13 UTC

The problem with your argument is that no one is forcing anyone to say anything, or banning anything.

If the terms really were offensive or exclusionary or had other bad side effects, then it's reasonable to discuss as a community whether to keep them, and possibly decide to stop using them. Ther

...(continued)