- We show how to approximately represent a quantum state using the square root of the usual amount of classical memory. The classical representation of an n-qubit state $\psi$ consists of its inner products with $O(\sqrt{2^n})$ stabilizer states. A quantum state initially specified by its $2^n$ entries in the computational basis can be compressed to this form in time $O(2^n \mathrm{poly}(n))$, and, subsequently, the compressed description can be used to additively approximate the expectation value of an arbitrary observable. Our compression scheme directly gives a new protocol for the vector in subspace problem with randomized one-way communication complexity that matches (up to polylogarithmic factors) the best known upper bound, due to Raz. We obtain an exponential improvement over Raz's protocol in terms of computational efficiency.
- Jan 17 2018 quant-ph cond-mat.str-el arXiv:1801.05390v1We describe an approach to fix the gauge degrees of freedom in tensor networks, including those with closed loops, which allows a canonical form for arbitrary tensor networks to be realized. Additionally, a measure for the internal correlations present in a tensor network is proposed, which quantifies the extent of resonances around closed loops in the network. Finally we describe an algorithm for the optimal truncation of an internal index from a tensor network, based upon proper removal of the redundant internal correlations. These results, which offer a unified theoretical framework for the manipulation of tensor networks with closed loops, can be applied to improve existing tensor network methods for the study of many-body systems and may also constitute key algorithmic components of sophisticated new tensor methods.
- Jan 18 2018 quant-ph arXiv:1801.05450v1We develop a general framework characterizing the structure and properties of quantum resource theories for continuous-variable Gaussian states and Gaussian operations, establishing methods for their description and quantification. We show in particular that, under a few intuitive and physically-motivated assumptions on the set of free states, no Gaussian quantum resource can be distilled with Gaussian free operations, even when an unlimited supply of the resource state is available. This places fundamental constraints on state transformations in all such Gaussian resource theories. Our methods rely on the definition of a general Gaussian resource quantifier whose value does not change when multiple copies are considered. We discuss in particular the applications to quantum entanglement, where we extend previously known results by showing that Gaussian entanglement cannot be distilled even with Gaussian operations preserving the positivity of the partial transpose, as well as to other Gaussian resources such as steering and optical nonclassicality. A unified semidefinite programming representation of all these resources is provided.
- Jan 19 2018 quant-ph arXiv:1801.06121v1Randomized benchmarking provides a tool for obtaining precise quantitative estimates of the average error rate of a physical quantum channel. Here we define real randomized benchmarking, which enables a separate determination of the average error rate in the real and complex parts of the channel. This provides more fine-grained information about average error rates with approximately the same cost as the standard protocol. The protocol requires only averaging over the real Clifford group, a subgroup of the full Clifford group, and makes use of the fact that it forms an orthogonal 2-design. Our results are therefore especially useful when considering quantum computations on rebits (or real encodings of complex computations), in which case the real Clifford group now plays the role of the complex Clifford group when studying stabilizer circuits.
- Jan 17 2018 quant-ph arXiv:1801.05237v1The so-called information-thermodynamics link has been created in a series of works starting from Maxwell demon and established by the idea of transforming information into work in the though experiment of Szilard which then evolved into the vast field of research. The aim of this note is firstly to present two new models of the Szilard engine containing arbitrary number of molecules which show irrelevance of acquiring information for work extraction. Secondly, the difference between the definition of entropy for ergodic systems and systems with ergodicity breaking constraints is emphasized. The role of nonergodic systems as information carriers and the thermodynamic cost of stability and accuracy of information encoding and processing is briefly discussed.
- We analyze certain class of linear maps on matrix algebras that become entanglement breaking after composing a finite or infinite number of times with themselves. This means that the Choi matrix of the iterated linear map becomes separable in the tensor product space. If a linear map is entanglement breaking after finite iterations, we say the map has a finite index of separability. In particular we show that every unital PPT-channel becomes entanglement breaking after a finite number of iterations. It turns out that the class of unital channels that have finite index of separability is a dense subset of the unital channels. We construct concrete examples of maps which are not PPT but have finite index of separability. We prove that there is a large class of unital channels that are asymptotically entanglement breaking. This analysis is motivated by the PPT-squared conjecture made by M. Christandl that says every PPT channel, when composed with itself, becomes entanglement breaking.
- We explore the possibility of efficient classical simulation of linear optics experiments under the effect of particle losses. Specifically, we investigate the canonical boson sampling scenario in which an $n$-particle Fock input state propagates through a linear-optical network and is subsequently measured by particle-number detectors in the $m$ output modes. We examine two models of losses. In the first model a fixed number of particles is lost. We prove that in this scenario the output statistics can be well approximated by an efficient classical simulation, provided that the number of photons that is left grows slower than $\sqrt{n}$. In the second loss model, every time a photon passes through a beamsplitter in the network, it has some probability of being lost. For this model the relevant parameter is $s$, the smallest number of beamsplitters that any photon traverses as it propagates through the network. We prove that it is possible to approximately simulate the output statistics already if $s$ grows logarithmically with $m$, regardless of the geometry of the network. The latter result is obtained by proving that it is always possible to commute $s$ layers of uniform losses to the input of the network regardless of its geometry, which could be a result of independent interest. We believe that our findings put strong limitations on future experimental realizations of quantum computational supremacy proposals based on boson sampling.
- We extend the concept of strange correlators, defined for symmetry-protected phases in [You et al., Phys. Rev. Lett. 112, 247202 (2014)], to topological phases of matter by taking the inner product between string-net ground states and product states. The resulting two-dimensional partition functions are shown to be either critical or symmetry broken, as the corresponding transfer matrices inherit all matrix product operator symmetries of the string-net states. For the case of critical systems, those non-local matrix product operator symmetries are the lattice remnants of topological conformal defects in the field theory description. Following [Aasen et al., J. Phys. A 49, 354001 (2016)], we argue that the different conformal boundary conditions can be obtained by applying the strange correlator concept to the different topological sectors of the string-net obtained from Ocneanu's tube algebra. This is demonstrated by calculating the conformal field theory spectra on the lattice in the different topological sectors for the Fibonacci/hard-hexagon and Ising string-net. Additionally, we provide a complementary perspective on symmetry-preserving real-space renormalization by showing how known tensor network renormalization methods can be understood as the approximate truncation of an exactly coarse-grained strange correlator.
- Jan 17 2018 quant-ph arXiv:1801.05185v1The accessible information and the informational power quantify the maximum amount of information that can be extracted from a quantum ensemble and by a quantum measurement, respectively. Here, we investigate the tradeoff between the accessible information (informational power, respectively) and the purity of the states of the ensemble (the elements of the measurement, respectively). Under any given lower bound on the purity, i) we compute the minimum informational power and show that it is attained by the depolarized uniformly-distributed measurement; ii) we give a lower bound on the accessible information. Under any given upper bound on the purity, i) we compute the maximum accessible information and show that it is attained by an ensemble of pairwise commuting states with at most two distinct non-null eigenvalues; ii) we give a lower bound on the maximum informational power. The present results provide, as a corollary, novel sufficient conditions for the tightness of the Jozsa-Robb-Wootters lower bound to the accessible information.
- Jan 17 2018 quant-ph arXiv:1801.05123v1The use of imaginary numbers in modelling quantum mechanical systems encompasses the wave-like nature of quantum states. Here we introduce a resource theoretic framework for imaginarity, where the free states are taken to be those with density matrices that are real with respect to a fixed basis. This theory is closely related to the resource theory of coherence, as it is basis dependent, and the imaginary numbers appear in the off-diagonal elements of the density matrix. Unlike coherence however, the set of physically realizable free operations is identical to both completely resource non-generating operations, and stochastically resource non-generating operations. Moreover, the resource theory of imaginarity does not have a self-adjoint resource destroying map. After introducing and characterizing the free operations, we provide several measures of imaginarity, and give necessary and sufficient conditions for pure state transformations via physically consistent free operations in the single shot regime.
- We introduce a driven-dissipative two-mode bosonic system whose reservoir causes simultaneous loss of two photons in each mode and whose steady states are superpositions of pair-coherent/Barut-Girardello coherent states. We show how quantum information encoded in a steady-state subspace of this system is exponentially immune to phase drifts (cavity dephasing) in both modes. Additionally, it is possible to protect information from arbitrary photon loss in either (but not simultaneously both) of the modes by continuously monitoring the difference between the expected photon numbers of the logical states. Despite employing more resources, the two-mode scheme enjoys two advantages over its one-mode counterpart with regards to implementation using current circuit QED technology. First, monitoring the photon number difference can be done without turning off the currently implementable dissipative stabilizing process. Second, a lower average photon number per mode is required to enjoy a level of protection at least as good as that of the cat-codes. We discuss circuit QED proposals to stabilize the code states, perform gates, and protect against photon loss via either active syndrome measurement or an autonomous procedure. We introduce quasiprobability distributions allowing us to represent two-mode states of fixed photon number difference in a two-dimensional complex plane, instead of the full four-dimensional two-mode phase space. The two-mode codes are generalized to multiple modes in an extension of the stabilizer formalism to non-diagonalizable stabilizers. The $M$-mode codes can protect against either arbitrary photon losses in up to $M-1$ modes or arbitrary losses or gains in any one mode.
- Jan 17 2018 quant-ph arXiv:1801.05283v1A quantum computer has the potential to effciently solve problems that are intractable for classical computers. Constructing a large-scale quantum processor, however, is challenging due to errors and noise inherent in real-world quantum systems. One approach to this challenge is to utilize modularity--a pervasive strategy found throughout nature and engineering--to build complex systems robustly. Such an approach manages complexity and uncertainty by assembling small, specialized components into a larger architecture. These considerations motivate the development of a quantum modular architecture, where separate quantum systems are combined via communication channels into a quantum network. In this architecture, an essential tool for universal quantum computation is the teleportation of an entangling quantum gate, a technique originally proposed in 1999 which, until now, has not been realized deterministically. Here, we experimentally demonstrate a teleported controlled-NOT (CNOT) operation made deterministic by utilizing real-time adaptive control. Additionally, we take a crucial step towards implementing robust, error-correctable modules by enacting the gate between logical qubits, encoding quantum information redundantly in the states of superconducting cavities. Such teleported operations have significant implications for fault-tolerant quantum computation, and when realized within a network can have broad applications in quantum communication, metrology, and simulations. Our results illustrate a compelling approach for implementing multi-qubit operations on logical qubits within an error-protected quantum modular architecture.
- This is a very brief introduction to quantum computing and quantum information theory, primarily aimed at geometers. Beyond basic definitions and examples, I emphasize aspects of interest to geometers, especially connections with asymptotic representation theory. Proofs of most statements can be found in standard references.
- Jan 17 2018 quant-ph arXiv:1801.05057v1We present a simple protocol for certifying graph states in quantum networks using stabiliser measurements. The certification statements can easily be applied to different protocols using graph states. We see for example how it can be used to for measurement based verified quantum compu- tation, certified sampling of random unitaries and quantum metrology and sharing quantum secrets over untrusted channels.
- We present a theory of quantum gravity that combines a geometrical formulation of quantum field theory in space-time with classical Einstein's general relativity. This approach is based on the geometrization of quantum mechanics proposed in refs.[1,2] and combines quantum and gravitational effects into a global curvature of the Finsler's space associated to the 4N-dimensional configuration space of a N-particle system. In order to make this theory compatible with general relativity, the quantum effects are described in the framework of quantum field theory, where a covariant definition of 'simultaneity' for many-body systems is introduced through the definition of a suited foliation of space-time. As for Einstein's classical gravitation theory, the particles dynamics is finally described by means of a geodesic equation in a curved space-time manifold.
- Jan 17 2018 hep-th arXiv:1801.05289v1In this paper we propose a space-time random tensor network approach for understanding holographic duality. Using tensor networks with random link projections, we define boundary theories with interesting holographic properties, such as the Renyi entropies satisfying the covariant Hubeny-Rangamani-Takayanagi formula, and operator correspondence with local reconstruction properties. We also investigate the unitarity of boundary theory in spacetime geometries with Lorenzian signature. Compared with the spatial random tensor networks, the space-time generalization does not require a particular time slicing, and provides a more covariant family of microscopic models that may help us to understand holographic duality.
- We introduce a general framework for de Finetti reduction results, applicable to various notions of partially exchangeable probability distributions. Explicit statements are derived for the cases of exchangeability, Markov exchangeability, and some generalizations of these. Our techniques are combinatorial and rely on the "BEST" theorem, enumerating the Eulerian cycles of a multigraph.
- Jan 17 2018 cs.CC arXiv:1801.05202v1In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (BMM), and prove lower bounds on computing BMM in these models. First, we give a relatively relaxed combinatorial model which is an extension of the model by Angluin (1976), and we prove that the time required by any algorithm for the BMM is at least $\Omega(n^3 / 2^{O( \sqrt{ \log n })})$. Subsequently, we propose a more general model capable of simulating the "Four Russians Algorithm". We prove a lower bound of $\Omega(n^{7/3} / 2^{O(\sqrt{ \log n })})$ for the BMM under this model. We use a special class of graphs, called $(r,t)$-graphs, originally discovered by Rusza and Szemeredi (1978), along with randomization, to construct matrices that are hard instances for our combinatorial models.
- In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks.
- An efficient variational approach is developed to solve in- and out-of-equilibrium problems of generic quantum spin-impurity systems. Employing the discrete symmetry hidden in spin-impurity models, we present a new canonical transformation that completely decouples the impurity and bath degrees of freedom. Combining it with Gaussian states, we present a family of many-body states to efficiently encode nontrivial impurity-bath correlations. We demonstrate its successful application to the anisotropic and two-lead Kondo models by studying their spatiotemporal dynamics, universal nonperturbative scaling and transport phenomena, and compare to other analytical and numerical approaches. In particular, we apply our method to study new types of nonequilibrium phenomena that have not been studied by other methods, such as long-time crossover in the ferromagnetic easy-plane Kondo model. Our results can be tested in experiments with mesoscopic electron systems and ultracold atoms in optical lattices.
- The weight decision problem, which requires to determine the Hamming weight of a given binary string, is a natural and important problem, with applications in cryptanalysis, coding theory, fault-tolerant circuit design and so on. In particular, both Deutsch-Jozsa problem and Grover search problem can be interpreted as special cases of weight decision problems. In this work, we investigate the exact quantum query complexity of weight decision problems, where the quantum algorithm must always output the correct answer. More specifically we consider a partial Boolean function which distinguishes whether the Hamming weight of the length-$n$ input is $k$ or it is $l$. Our contribution includes both upper bounds and lower bounds for the precise number of queries. Furthermore, for most choices of $(\frac{k}{n},\frac{l}{n})$ and sufficiently large $n$, the gap between our upper and lower bounds is no more than one. To get the results, we first build the connection between Chebyshev polynomials and our problem, then determine all the boundary cases of $(\frac{k}{n},\frac{l}{n})$ with matching upper and lower bounds, and finally we generalize to other cases via a new \emphquantum padding technique. This quantum padding technique can be of independent interest in designing other quantum algorithms.
- We show that braidings on a fusion category $\mathcal{C}$ correspond to certain fusion subcategories of the center of $\mathcal{C}$ transversal to the canonical Lagrangian algebra. This allows to classify braidings on non-degenerate and group-theoretical fusion categories.
- Jan 19 2018 cs.LG arXiv:1801.05927v1In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.
- Jan 18 2018 quant-ph arXiv:1801.05798v1This paper reconstructs quantum theory from operational principles starting from something we dub an effect theory, a generalisation of operational probabilistic theories that does not have any monoidal structure and does not a priori refer to the real numbers, allowing a study of more exotic theories. Our first postulates require the existence of initial filters and final compressions for effects, a variation of the ideal compression axiom of Chiribella et al. (2011). Restricting to the standard operational setting (real probabilities, finite-dimensional) we show that this leads to the existence of spectral decompositions of effects in terms of sharp effects. In the presence of two additional postulates that are relatively standard (composition of pure maps being pure again, and the existence of a transitive symmetry group) we show that the systems must be Euclidean Jordan algebras. Finally we add an "observability of energy" condition (Barnum et al. 2014) to complete the reconstruction of quantum theory.
- We construct examples of translationally invariant solvable models of strongly-correlated metals, composed of lattices of Sachdev-Ye-Kitaev dots with identical local interactions. These models display crossovers as a function of temperature into regimes with local quantum criticality and marginal-Fermi liquid behavior. In the marginal Fermi liquid regime, the dc resistivity increases linearly with temperature over a broad range of temperatures. By generalizing the form of interactions, we also construct examples of non-Fermi liquids with critical Fermi-surfaces. The self energy has a singular frequency dependence, but lacks momentum dependence, reminiscent of a dynamical mean field theory-like behavior but in dimensions $d<\infty$. In the low temperature and strong-coupling limit, a heavy Fermi liquid is formed. The critical Fermi-surface in the non-Fermi liquid regime gives rise to quantum oscillations in the magnetization as a function of an external magnetic field in the absence of quasiparticle excitations. We discuss the implications of these results for local quantum criticality and for fundamental bounds on relaxation rates. Drawing on the lessons from these models, we formulate conjectures on coarse grained descriptions of a class of intermediate scale non-fermi liquid behavior in generic correlated metals.
- Jan 17 2018 cond-mat.str-el arXiv:1801.05416v1Distinct quantum vacua of topologically ordered states can be tunneled into each other via extended operators. The possible applications include condensed matter and quantum cosmology. We present a straightforward approach to calculate the partition function on various manifolds and ground state degeneracy (GSD), mainly based on continuum/cochain Topological Quantum Field Theories (TQFT), in any dimensions. This information can be related to the counting of extended operators of bosonic/fermionic TQFT. On the lattice scale, anyonic particles/strings live at the ends of line/surface operators. Certain systems in different dimensions are related to each other through dimensional reduction schemes, analogous to decategorification. We consider situations where a TQFT lives on (1) a closed spacetime or (2) a spacetime with boundary, such that both the bulk and boundary are fully-gapped and long-range entangled (LRE). Anyonic excitations can be deconfined on the boundary. We describe such new exotic topological interfaces on which neither particle nor string excitation alone condensed, but only composite objects of extended operators can end (e.g. a string-like composite object formed by a set of particles can end on a special 2+1D boundary of 3+1D bulk). We explore the relations between group extension constructions and partially breaking constructions (e.g. 0-form/higher-form/"composite" breaking) of topological boundaries, after gauging. We comment on the implications of entanglement entropy for some of such LRE systems.
- Jan 17 2018 quant-ph physics.class-ph arXiv:1801.05015v1We present an axiomatic framework for thermodynamics that incorporates information as a fundamental concept. The axioms describe both ordinary thermodynamic processes and those in which information is acquired, used and erased, as in the operation of Maxwell's demon. This system, like previous axiomatic systems for thermodynamics, supports the construction of conserved quantities and an entropy function governing state changes. Here, however, the entropy exhibits both information and thermodynamic aspects. Although our axioms are not based upon probabilistic concepts, a natural and highly useful concept of probability emerges from the entropy function itself. Our abstract system has many models, including both classical and quantum examples.
- Jan 19 2018 quant-ph arXiv:1801.06115v1We analyze the performance of quantum teleportation in terms of average fidelity and fidelity deviation. The average fidelity is defined as the average value of the fidelities over all possible input states and the fidelity deviation is their standard deviation, which is referred to as a concept of fluctuation or universality. In the analysis, we find the condition to optimize both measures under a noisy quantum channel---we here consider the so-called Werner channel. To characterize our results, we introduce a two-dimensional space defined by the aforementioned measures, in which the performance of the teleportation is represented as a point with the channel noise parameter. Through further analysis, we specify some regions drawn for different channel conditions, establishing the connection to the dissimilar contributions of the entanglement to the teleportation and the Bell inequality violation.
- Jan 19 2018 cs.NE arXiv:1801.06105v1Plain recurrent networks greatly suffer from the vanishing gradient problem while Gated Neural Networks (GNNs) such as Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deliver promising results in many sequence learning tasks through sophisticated network designs. This paper shows how we can address this problem in a plain recurrent network by analyzing the gating mechanisms in GNNs. We propose a novel network called the Recurrent Identity Network (RIN) which allows a plain recurrent network to overcome the vanishing gradient problem while training very deep models without the use of gates. We compare this model with IRNNs and LSTMs on multiple sequence modeling benchmarks. The RINs demonstrate competitive performance and converge faster in all tasks. Notably, small RIN models produce 12%--67% higher accuracy on the Sequential and Permuted MNIST datasets and reach state-of-the-art performance on the bAbI question answering dataset.
- To implement convolutional neural networks (CNN) in hardware, the state-of-the-art CNN accelerators pipeline computation and data transfer stages using an off-chip memory and simultaneously execute them on the same timeline. However, since a large amount of feature maps generated during the operation should be transmitted to the off-chip memory, the pipeline stage length is determined by the off-chip data transfer stage. Fusion architectures that can fuse multiple layers have been proposed to solve this problem, but applications such as super-resolution (SR) require a large amount of an on-chip memory because of the high resolution of the feature maps. In this paper, we propose a novel on-chip CNN accelerator for SR to optimize the CNN dataflow in the on-chip memory. First, the convolution loop optimization technique is proposed to prevent using a frame buffer. Second, we develop a combined convolutional layer processor to reduce the buffer size used to store the feature maps. Third, we explore how to perform low-cost multiply-and-accumulate operations in the deconvolutional layer used in SR. Finally, we propose a two-stage quantization algorithm to select the optimized hardware size for the limited number of DSPs to implement the on-chip CNN accelerator. We evaluate our proposed accelerator with FSRCNN, which is most popular as the CNN-based SR algorithm. Experimental results show that the proposed accelerator requires 9.21ms to achieve an output image with the 2560x1440 pixel resolution, which is 36 times faster than the conventional method. In addition, we reduce the on-chip memory usage and DSP usage by 4 times and 1.44 times, respectively, compared to the conventional methods.
- Jan 19 2018 quant-ph arXiv:1801.05943v1We obtain analytic solutions to various models of dissipation of the quantum harmonic oscillator, employing a simple method in the Wigner function Fourier transform description of the system; and study as an exemplification, the driven open quantum harmonic oscillator. The environmental models we use are based on optical master equations for the zero and finite temperature bath and whose open dynamics are described by a Lindblad master equation, and also we use the Caldeira-Leggett model for the high temperature limit, in the the under damped an the over damped case. Under the Wigner Fourier transform or chord function as it has been called, it becomes particularly simple to solve the dynamics of the open oscillator in the sense that the dynamics of the system are reduced to the application of an evolution matrix related to the damped motion of the oscillator.
- Factor analysis, a classical multivariate statistical technique is popularly used as a fundamental tool for dimensionality reduction in statistics, econometrics and data science. Estimation is often carried out via the Maximum Likelihood (ML) principle, which seeks to maximize the likelihood under the assumption that the positive definite covariance matrix can be decomposed as the sum of a low rank positive semidefinite matrix and a diagonal matrix with nonnegative entries. This leads to a challenging rank constrained nonconvex optimization problem. We reformulate the low rank ML Factor Analysis problem as a nonlinear nonsmooth semidefinite optimization problem, study various structural properties of this reformulation and propose fast and scalable algorithms based on difference of convex (DC) optimization. Our approach has computational guarantees, gracefully scales to large problems, is applicable to situations where the sample covariance matrix is rank deficient and adapts to variants of the ML problem with additional constraints on the problem parameters. Our numerical experiments demonstrate the significant usefulness of our approach over existing state-of-the-art approaches.
- Now a days, the major challenge in machine learning is the `Big~Data' challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The focus is on reducing the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used Empirical Risk Minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove the same convergence for systematic sampling, cyclic sampling and the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques.
- Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.
- Jan 19 2018 cs.CY arXiv:1801.06055v1Rapport, the close and harmonious relationship in which interaction partners are "in sync" with each other, was shown to result in smoother social interactions, improved collaboration, and improved interpersonal outcomes. In this work, we are first to investigate automatic prediction of low rapport during natural interactions within small groups. This task is challenging given that rapport only manifests in subtle non-verbal signals that are, in addition, subject to influences of group dynamics as well as inter-personal idiosyncrasies. We record videos of unscripted discussions of three to four people using a multi-view camera system and microphones. We analyse a rich set of non-verbal signals for rapport detection, namely facial expressions, hand motion, gaze, speaker turns, and speech prosody. Using facial features, we can detect low rapport with an average precision of 0.7 (chance level at 0.25), while incorporating prior knowledge of participants' personalities can even achieve early prediction without a drop in performance. We further provide a detailed analysis of different feature sets and the amount of information contained in different temporal segments of the interactions.
- Jan 18 2018 cs.CV arXiv:1801.05746v1Pixel-wise image segmentation is demanding task in computer vision. Classical U-Net architectures composed of encoders and decoders are very popular for segmentation of medical images, satellite images etc. Typically, neural network initialized with weights from a network pre-trained on a large data set like ImageNet shows better performance than those trained from scratch on a small dataset. In some practical applications, particularly in medicine and traffic safety, the accuracy of the models is of utmost importance. In this paper, we demonstrate how the U-Net type architecture can be improved by the use of the pre-trained encoder. Our code and corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNet. We compare three weight initialization schemes: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset. This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge.
- Gaussian graphical models are used for determining conditional relationships between variables. This is accomplished by identifying off-diagonal elements in the inverse-covariance matrix that are non-zero. When the ratio of variables (p) to observations (n) approaches one, the maximum likelihood estimator of the covariance matrix becomes unstable and requires shrinkage estimation. Whereas several classical (frequentist) methods have been introduced to address this issue, Bayesian methods remain relatively uncommon in practice and methodological literatures. Here we introduce a Bayesian method for estimating sparse matrices, in which conditional relationships are determined with projection predictive selection. Through simulation and an applied example, we demonstrate that the proposed method often outperforms both classical and alternative Bayesian estimators with respect to frequentist risk and consistently made the fewest false positives.We end by discussing limitations and future directions, as well as contributions to the Bayesian literature on the topic of sparsity.
- Jan 18 2018 cs.CV arXiv:1801.05678v1Owe to the rapid development of deep neural network (DNN) techniques and the emergence of large scale face databases, face recognition has achieved a great success in recent years. During the training process of DNN, the face features and classification vectors to be learned will interact with each other, while the distribution of face features will largely affect the convergence status of network and the face similarity computing in test stage. In this work, we formulate jointly the learning of face features and classification vectors, and propose a simple yet effective centralized coordinate learning (CCL) method, which enforces the features to be dispersedly spanned in the coordinate space while ensuring the classification vectors to lie on a hypersphere. An adaptive angular margin is further proposed to enhance the discrimination capability of face features. Extensive experiments are conducted on six face benchmarks, including those have large age gap and hard negative samples. Trained only on the small-scale CASIA Webface dataset with 460K face images from about 10K subjects, our CCL model demonstrates high effectiveness and generality, showing consistently competitive performance across all the six benchmark databases.
- This paper concerns open-world classification, where the classifier not only needs to classify test examples into seen classes that have appeared in training but also reject examples from unseen or novel classes that have not appeared in training. Specifically, this paper focuses on discovering the hidden unseen classes of the rejected examples. Clearly, without prior knowledge this is difficult. However, we do have the data from the seen training classes, which can tell us what kind of similarity/difference is expected for examples from the same class or from different classes. It is reasonable to assume that this knowledge can be transferred to the rejected examples and used to discover the hidden unseen classes in them. This paper aims to solve this problem. It first proposes a joint open classification model with a sub-model for classifying whether a pair of examples belongs to the same or different classes. This sub-model can serve as a distance function for clustering to discover the hidden classes of the rejected examples. Experimental results show that the proposed model is highly promising.
- Jan 18 2018 cs.CV arXiv:1801.05599v2In this paper, we propose a conceptually simple and geometrically interpretable objective function, i.e. additive margin Softmax (AM-Softmax), for deep face verification. In general, the face verification task can be viewed as a metric learning problem, so learning large-margin face features whose intra-class variation is small and inter-class difference is large is of great importance in order to achieve good performance. Recently, Large-margin Softmax and Angular Softmax have been proposed to incorporate the angular margin in a multiplicative manner. In this work, we introduce a novel additive angular margin for the Softmax loss, which is intuitively appealing and more interpretable than the existing works. We also emphasize and discuss the importance of feature normalization in the paper. Most importantly, our experiments on LFW BLUFR and MegaFace show that our additive margin softmax loss consistently performs better than the current state-of-the-art methods using the same network architecture and training dataset. Our code has also been made available at https://github.com/happynear/AMSoftmax
- Jan 18 2018 cs.CR arXiv:1801.05507v1The growing popularity of cloud-based machine learning raises a natural question about the privacy guarantees that can be provided in such a setting. Our work tackles this problem in the context where a client wishes to classify private images using a convolutional neural network (CNN) trained by a server. Our goal is to build efficient protocols whereby the client can acquire the classification result without revealing their input to the server, while guaranteeing the privacy of the server's neural network. To this end, we design Gazelle, a scalable and low-latency system for secure neural network inference, using an intricate combination of homomorphic encryption and traditional two-party computation techniques (such as garbled circuits). Gazelle makes three contributions. First, we design the Gazelle homomorphic encryption library which provides fast algorithms for basic homomorphic operations such as SIMD (single instruction multiple data) addition, SIMD multiplication and ciphertext permutation. Second, we implement the Gazelle homomorphic linear algebra kernels which map neural network layers to optimized homomorphic matrix-vector multiplication and convolution routines. Third, we design optimized encryption switching protocols which seamlessly convert between homomorphic and garbled circuit encodings to enable implementation of complete neural network inference. We evaluate our protocols on benchmark neural networks trained on the MNIST and CIFAR-10 datasets and show that Gazelle outperforms the best existing systems such as MiniONN (ACM CCS 2017) by 20 times and Chameleon (Crypto Eprint 2017/1164) by 30 times in online runtime. Similarly when compared with fully homomorphic approaches like CryptoNets (ICML 2016) we demonstrate three orders of magnitude faster online run-time.
- We study asymptotic properties of expectation propagation (EP) - a method for approximate inference originally developed in the field of machine learning. Applied to generalized linear models, EP iteratively computes a multivariate Gaussian approximation to the exact posterior distribution. The computational complexity of the repeated update of covariance matrices severely limits the application of EP to large problem sizes. In this paper, we present a rigorous analysis by means of free probability theory that allows us to overcome this computational bottleneck if specific data matrices in the problem fulfill a certain property of asymptotic freeness. We demonstrate the relevance of our approach on the gene selection problem of a microarray dataset.
- We examine Deep Canonically Correlated LSTMs as a way to learn nonlinear transformations of variable length sequences and embed them into a correlated, fixed dimensional space. We use LSTMs to transform multi-view time-series data non-linearly while learning temporal relationships within the data. We then perform correlation analysis on the outputs of these neural networks to find a correlated subspace through which we get our final representation via projection. This work follows from previous work done on Deep Canonical Correlation (DCCA), in which deep feed-forward neural networks were used to learn nonlinear transformations of data while maximizing correlation.
- Jan 17 2018 math.AG arXiv:1801.05337v1This text serves as an introduction to $\mathbb{F}_1$-geometry for the general mathematician. We explain the initial motivations for $\mathbb{F}_1$-geometry in detail, provide an overview of the different approaches to $\mathbb{F}_1$ and describe the main achievements of the field.
- Jan 17 2018 cs.CV arXiv:1801.05269v1Robust cross-seasonal localization is one of the major challenges in long-term visual navigation of autonomous vehicles. In this paper, we exploit recent advances in semantic segmentation of images, i.e., where each pixel is assigned a label related to the type of object it represents, to solve the problem of long-term visual localization. We show that semantically labeled 3D point maps of the environment, together with semantically segmented images, can be efficiently used for vehicle localization without the need for detailed feature descriptors (SIFT, SURF, etc.). Thus, instead of depending on hand-crafted feature descriptors, we rely on the training of an image segmenter. The resulting map takes up much less storage space compared to a traditional descriptor based map. A particle filter based semantic localization solution is compared to one based on SIFT-features, and even with large seasonal variations over the year we perform on par with the larger and more descriptive SIFT-features, and are able to localize with an error below 1 m most of the time.
- Jan 17 2018 quant-ph arXiv:1801.05188v1Clausius inequality has deep implications for reversibility and the arrow of time. Quantum theory is able to extend this result for closed systems by inspecting the trajectory of the density matrix on its manifold. Here we show that this approach can provide an upper and lower bound to the irreversible entropy production for open quantum systems as well. These provide insights on the thermodynamics of the information erasure. Limits of the applicability of our bounds are discussed, and demonstrated in a quantum photonic simulator.
- We derive exact (ensemble-tight) error and erasure exponents for the asymmetric broadcast channel given a random superposition codebook. We consider Forney's optimal decoder for both messages and the message pair for the receiver that decodes both messages. We prove that the optimal decoder designed to decode the pair of messages achieves the optimal trade-off between the total and undetected exponents associated with the optimal decoder for the private message. We propose convex optimization-based procedures to evaluate the exponents efficiently. Numerical examples are presented to illustrate the results.
- Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an "end-to-end" approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies. A notable drawback is that even in the most basic continuous control problem (that of linear quadratic regulators), these methods must solve a non-convex optimization problem, where little is understood about their efficiency from both computational and statistical perspectives. In contrast, system identification and model based planning in optimal control theory have a much more solid theoretical footing, where much is known with regards to their computational and statistical properties. This work bridges this gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.
- Jan 17 2018 physics.chem-ph quant-ph arXiv:1801.04924v1We provide a unified theoretical approach to the quantum dynamics of absorption of single photons and subsequent excitonic energy transfer in photosynthetic light-harvesting complexes. Our analysis combines a continuous mode <n>-photon quantum optical master equation for the chromophoric system with the hierarchy of equations of motion describing excitonic dynamics in presence of non-Markovian coupling to vibrations of the chromophores and surrounding protein. We apply the approach to simulation of absorption of single-photon coherent states by pigment-protein complexes containing between one and seven chromophores, and compare with results obtained by excitation using a thermal radiation field. We show that the values of excitation probability obtained under single-photon absorption conditions can be consistently related to bulk absorption cross-sections. Analysis of the timescale and efficiency of single-photon absorption by light-harvesting systems within this full quantum description of pigment-protein dynamics coupled to a quantum radiation field reveals a non-trivial dependence of the excitation probability and the excited state dynamics induced by exciton-phonon coupling during and subsequent to the pulse, on the bandwidth of the incident photon pulse. For bandwidths equal to the spectral bandwidth of Chlorophyll a, our results yield an estimation of an average time of ~0.09 s for a single chlorophyll chromophore to absorb the energy equivalent of one (single-polarization) photon under irradiation by single-photon states at the intensity of sunlight.
- Jan 17 2018 physics.gen-ph quant-ph arXiv:1801.05286v2The complete general fundamental theory of the dynamical hypergraphs whose `all' mathematical structures are quantized, HoloQuantum Network Theory, is originally defined and formulated based upon a unique system of nine principles. HoloQuantum Networks are the quantum states of a (0+1) dimensional purely-information-theoretic quantum many body system, made of a complete set of the distinctively-interacting qubits of the absences-or-presences, which formulate the most complete unitary evolutions of the most general superpositions of the arbitrarily-structured hypergraphs. All the defining interactions and the complete total Hamiltonian of the quantum many body system of HoloQuantum Network Theory are uniquely obtained upon realizing the dynamical hypergraphical well-definedness by all the `cascade operators', the quantum-hypergraphical isomorphisms, the U(1) symmetry for the global-phase redundancies of the multi-qubit wavefunctions, the minimally broken symmetry of the qubits-equal-treatment, a Wheelerian maximal randomness, and the `covariant completeness'. By the axiomatic definition and construction of the theory, HoloQuantum Networks are all the time-dependent purely-information-theoretic wavefunctions, and mixed states, of every in-principle-realizable quantum many body system of the arbitrarily-chosen quantum objects and their arbitrarily-chosen quantum relations. Being so, we propose HoloQuantuam Network Theory as the most-fundamental most-complete form-invariant `it-from-qubit' theory of `All Quantum Natures'.