- May 23 2018 quant-ph arXiv:1805.08385v1Product formulas can be used to simulate Hamiltonian dynamics on a quantum computer by approximating the exponential of a sum of operators by a product of exponentials of the individual summands. This approach is both straightforward and surprisingly efficient. We show that by simply randomizing how the summands are ordered, one can prove stronger bounds on the quality of approximation and thereby give more efficient simulations. Indeed, we show that these bounds can be asymptotically better than previous bounds that exploit commutation between the summands, despite using much less information about the structure of the Hamiltonian. Numerical evidence suggests that our randomized algorithm may be advantageous even for near-term quantum simulation.
- We show that combining two different hypothetical enhancements to quantum computation---namely, quantum advice and non-collapsing measurements---would let a quantum computer solve any decision problem whatsoever in polynomial time, even though neither enhancement yields extravagant power by itself. This complements a related result due to Raz. The proof uses locally decodable codes.
- May 23 2018 quant-ph arXiv:1805.08227v1We compare the performance of quantum error correcting codes when memory errors are unitary with the more familiar case of dephasing noise. For a wide range of codes we analytically compute the effective logical channel that results when the error correction steps are performed noiselessly. Our examples include the entire family of repetition codes, the 5-qubit, Steane, Shor, and surface codes. When errors are measured in terms of the diamond norm, we find that the error correction is typically much more effective for unitary errors than for dephasing. We observe this behavior for a wide range of codes after a single level of encoding, and in the thresholds of concatenated codes using hard decoders. We show that this holds with great generality by proving a bound on the performance of any stabilizer code when the noise at the physical level is unitary. By comparing the diamond norm error $D'_\diamond$ of the logical qubit with the same quantity at the physical level $D_\diamond$, we show that $D'_\diamond \le c D^d_\diamond $ where $d$ is the distance of the code and $c$ is constant that depends on the code but not on the error. This bound compares very favorably to the performance of error correction for dephasing noise and other Pauli channels, where an error correcting code of odd distance $d$ will exhibit a scaling $D'_\diamond \sim D_\diamond^{(d+1)/2}$.
- Quantum mechanics fundamentally forbids deterministic discrimination of quantum states and processes. However, the ability to optimally distinguish various classes of quantum data is an important primitive in quantum information science. In this work, we train near-term quantum circuits to classify data represented by non-orthogonal quantum probability distributions using the Adam stochastic optimization algorithm. This is achieved by iterative interactions of a classical device with a quantum processor to discover the parameters of an unknown non-unitary quantum circuit. This circuit learns to simulates the unknown structure of a generalized quantum measurement, or Positive-Operator-Value-Measure (POVM), that is required to optimally distinguish possible distributions of quantum inputs. Notably we use universal circuit topologies, with a theoretically motivated circuit design, which guarantees that our circuits can in principle learn to perform arbitrary input-output mappings. Our numerical simulations show that shallow quantum circuits could be trained to discriminate among various pure and mixed quantum states exhibiting a trade-off between minimizing erroneous and inconclusive outcomes with comparable performance to theoretically optimal POVMs. We train the circuit on different classes of quantum data and evaluate the generalization error on unseen mixed quantum states. This generalization power hence distinguishes our work from standard circuit optimization and provides an example of quantum machine learning for a task that has inherently no classical analogue.
- The quantum chromatic number, $\chi_q(G)$, of a graph $G$ was originally defined as the minimal number of colors necessary in a quantum protocol in which two provers that cannot communicate with each other but share an entangled state can convince an interrogator with certainty that they have a coloring of the graph. We use an equivalent purely combinatorial definition of $\chi_q(G)$ to prove that many spectral lower bounds for the chromatic number, $\chi(G)$, are also lower bounds for $\chi_q(G)$. This is achieved using techniques from linear algebra called pinching and twirling. We illustrate our results with some examples.
- May 23 2018 quant-ph arXiv:1805.08721v1We describe a general procedure for associating a minimal informationally-complete quantum measurement (or MIC) and a set of linearly independent post-measurement quantum states with a purely probabilistic representation of the Born Rule. Such representations are motivated by QBism, where the Born Rule is understood as a consistency condition between probabilities assigned to the outcomes of one experiment in terms of the probabilities assigned to the outcomes of other experiments. In this setting, the difference between quantum and classical physics is the way their physical assumptions augment bare probability theory: Classical physics corresponds to a trivial augmentation---one just applies the Law of Total Probability (LTP) between the scenarios---while quantum theory makes use of the Born Rule expressed in one or another of the forms of our general procedure. To mark the essential difference between quantum and classical, one should seek the representations that minimize the disparity between the expressions. We prove that the representation of the Born Rule obtained from a symmetric informationally-complete measurement (or SIC) minimizes this distinction in at least two senses---the first to do with unitarily invariant distance measures between the rules, and the second to do with available volume in a reference probability simplex (roughly speaking a new kind of uncertainty principle). Both of these arise from a significant majorization result. This work complements recent studies in quantum computation where the deviation of the Born Rule from the LTP is measured in terms of negativity of Wigner functions.
- Given a quantum many-body system with few-body interactions, how rapidly can quantum information be hidden during time evolution? The fast scrambling conjecture is that the time to thoroughly mix information among N degrees of freedom grows at least logarithmically in N. We derive this inequality for generic quantum systems at infinite temperature, by relating the scrambling time to a finite decay time of local quantum correlations at late times. Using Lieb-Robinson bounds, generalized Sachdev-Ye-Kitaev models, and random unitary circuits, we propose that a logarithmic scrambling time can be achieved in most quantum systems with sparse connectivity. These models also elucidate how quantum chaos is not universally related to scrambling: we construct random few-body circuits with infinite Lyapunov exponent but logarithmic scrambling time. We discuss analogies between quantum models on graphs and quantum black holes, and suggest methods to experimentally study scrambling with as many as 100 sparsely-connected quantum degrees of freedom.
- May 23 2018 cond-mat.stat-mech arXiv:1805.08487v1We consider a random time evolution operator composed of a circuit of random unitaries coupling even and odd neighbouring spins on a chain in turn. In spirit of Floquet evolution, the circuit is time-periodic; each timestep is repeated with the same random instances. We obtain analytical results for arbitrary local Hilbert space dimension d: On a single site, average time evolution acts as a depolarising channel. In the spin 1/2 (d=2) case, this is further quantified numerically. For that, we develop a new numerical method that reduces complexity by an exponential factor. Haar-distributed unitaries lead to full depolarisation after many timesteps, i.e. local thermalisation. A unitary probability distribution with tunable coupling strength allows us to observe a many-body localisation transition. In addition to a spin chain under a unitary circuit, we consider the analogous problem with Gaussian circuits. We can make stronger statements about the entire covariance matrix instead of single sites only, and find that the dynamics is localising. For a random time evolution operator homogeneous in space, however, the system delocalises.
- One of the main difficulties in analyzing neural networks is the non-convexity of the loss function which may have many bad local minima. In this paper, we study the landscape of neural networks for binary classification tasks. Under mild assumptions, we prove that after adding one special neuron with a skip connection to the output, or one special neuron per layer, every local minimum is a global minimum.
- We develop the first Bayesian Optimization algorithm, BLOSSOM, which selects between multiple alternative acquisition functions and traditional local optimization at each step. This is combined with a novel stopping condition based on expected regret. This pairing allows us to obtain the best characteristics of both local and Bayesian optimization, making efficient use of function evaluations while yielding superior convergence to the global minimum on a selection of optimization problems, and also halting optimization once a principled and intuitive stopping condition has been fulfilled.
- The Strong Exponential Time Hypothesis and the OV-conjecture are two popular hardness assumptions used to prove a plethora of lower bounds, especially in the realm of polynomial-time algorithms. The OV-conjecture in moderate dimension states there is no $\epsilon>0$ for which an $O(N^{2-\epsilon})\mathrm{poly}(D)$ time algorithm can decide whether there is a pair of orthogonal vectors in a given set of size $N$ that contains $D$-dimensional binary vectors. We strengthen the evidence for these hardness assumptions. In particular, we show that if the OV-conjecture fails, then two problems for which we are far from obtaining even tiny improvements over exhaustive search would have surprisingly fast algorithms. If the OV conjecture is false, then there is a fixed $\epsilon>0$ such that: (1) For all $d$ and all large enough $k$, there is a randomized algorithm that takes $O(n^{(1-\epsilon)k})$ time to solve the Zero-Weight-$k$-Clique and Min-Weight-$k$-Clique problems on $d$-hypergraphs with $n$ vertices. As a consequence, the OV-conjecture is implied by the Weighted Clique conjecture. (2) For all $c$, the satisfiability of sparse TC1 circuits on $n$ inputs (that is, circuits with $cn$ wires, depth $c\log n$, and negation, AND, OR, and threshold gates) can be computed in time ${O((2-\epsilon)^n)}$.
- Feature hashing, also known as \em the hashing trick, introduced by Weinberger et al. (2009), is one of the key techniques used in scaling-up machine learning algorithms. Loosely speaking, feature hashing uses a random sparse projection matrix $A : \mathbb{R}^n \to \mathbb{R}^m$ (where $m \ll n$) in order to reduce the dimension of the data from $n$ to $m$ while approximately preserving the Euclidean norm. Every column of $A$ contains exactly one non-zero entry, equals to either $-1$ or $1$. Weinberger et al. showed tail bounds on $\|Ax\|_2^2$. Specifically they showed that for every $\varepsilon, \delta$, if $\|x\|_{\infty} / \|x\|_2$ is sufficiently small, and $m$ is sufficiently large, then $$\Pr[ \; | \;\|Ax\|_2^2 - \|x\|_2^2\; | < \varepsilon \|x\|_2^2 \;] \ge 1 - \delta \;.$$ These bounds were later extended by Dasgupta \etal (2010) and most recently refined by Dahlgaard et al. (2017), however, the true nature of the performance of this key technique, and specifically the correct tradeoff between the pivotal parameters $\|x\|_{\infty} / \|x\|_2, m, \varepsilon, \delta$ remained an open question. We settle this question by giving tight asymptotic bounds on the exact tradeoff between the central parameters, thus providing a complete understanding of the performance of feature hashing. We complement the asymptotic bound with empirical data, which shows that the constants "hiding" in the asymptotic notation are, in fact, very close to $1$, thus further illustrating the tightness of the presented bounds in practice.
- This paper contributes to cross-lingual image annotation and retrieval in terms of data and methods. We propose COCO-CN, a novel dataset enriching MS-COCO with manually written Chinese sentences and tags. For more effective annotation acquisition, we develop a recommendation-assisted collective annotation system, automatically providing an annotator with several tags and sentences deemed to be relevant with respect to the pictorial content. Having 20,342 images annotated with 27,218 Chinese sentences and 70,993 tags, COCO-CN is currently the largest Chinese-English dataset applicable for cross-lingual image tagging, captioning and retrieval. We develop methods per task for effectively learning from cross-lingual resources. Extensive experiments on the multiple tasks justify the viability of our dataset and methods.
- May 23 2018 quant-ph arXiv:1805.08642v1The far field radiation pattern of three, dipole coupled, two level atoms is shown to yield sub and super radiant behavior, with the nature of light quanta controlled by the underlying quantum correlations. Superradiance is found to faithfully reflect the monogamy of quantum correlation and is robust against thermal effects. It persists at finite temperature with reduced intensity, even in the absence of entanglement but with non-zero quantum discord. The intensity of emitted radiation is highly focused and anisotropic in one phase and completely uniform in another, with the two phases separated by a cross over. Radiation intensity is shown to exhibit periodic variation from super to sub-radiant behavior, as a function of inter atomic spacing and observation angle, which persists up to a significantly high temperature. The precise effects of transition frequency and inter-dipole spacing on the angular spread and variations of the intensity in the uniform and non-uniform regimes are explicitly demonstrated at finite temperature. Photon-photon correlation is shown to exhibit sub and super Poissonian statistics in a parametrically controlled manner.
- Although interactive learning puts the user into the loop, the learner remains mostly a black box for the user. Understanding the reasons behind queries and predictions is important when assessing how the learner works and, in turn, trust. Consequently, we propose the novel framework of explanatory interactive learning: in each step, the learner explains its interactive query to the user, and she queries of any active classifier for visualizing explanations of the corresponding predictions. We demonstrate that this can boost the predictive and explanatory powers of and the trust into the learned model, using text (e.g. SVMs) and image classification (e.g. neural networks) experiments as well as a user study.
- May 23 2018 cs.CV arXiv:1805.08542v1Many computer vision and image processing applications rely on local features. It is well-known that motion blur decreases the performance of traditional feature detectors and descriptors. We propose an inertial-based deblurring method for improving the robustness of existing feature detectors and descriptors against the motion blur. Unlike most deblurring algorithms, the method can handle spatially-variant blur and rolling shutter distortion. Furthermore, it is capable of running in real-time contrary to state-of-the-art algorithms. The limitations of inertial-based blur estimation are taken into account by validating the blur estimates using image data. The evaluation shows that when the method is used with traditional feature detector and descriptor, it increases the number of detected keypoints, provides higher repeatability and improves the localization accuracy. We also demonstrate that such features will lead to more accurate and complete reconstructions when used in the application of 3D visual reconstruction.
- We show that it is possible to obtain an $O(\epsilon^{-4/3})$ runtime --- including computational cost --- for finding $\epsilon$-stationary points of nonconvex functions using cutting plane methods. This improves on the best known epsilon dependence achieved by cubic regularized Newton of $O(\epsilon^{-3/2})$ as proved by Nesterov and Polyak (2006)\nocitenesterov2006cubic. Our techniques utilize the convex until proven guilty principle proposed by Carmon, Duchi, Hinder, and Sidford (2017)\nocitecarmon2017convex.
- We study a recent model of collaborative PAC learning where $k$ players with $k$ different tasks collaborate to learn a single classifier that works for all tasks. Previous work showed that when there is a classifier that has very small error on all tasks, there is a collaborative algorithm that finds a single classifier for all tasks and it uses $O(\ln^2 (k))$ times the sample complexity to learn a single task. In this work, we design new algorithms for both the realizable and the non-realizable settings using only $O(\ln (k))$ times the sample complexity to learn a single task. The sample complexity upper bounds of our algorithms match previous lower bounds and in some range of parameters are even better than previous algorithms that are allowed to output different classifiers for different tasks.
- May 23 2018 quant-ph cond-mat.stat-mech arXiv:1805.08305v1Quantum open systems evolve according to completely positive, trace preserving maps acting on the density operator, which can equivalently be unraveled in term of so-called quantum trajectories. These stochastic sequences of pure states correspond to the actual dynamics of the quantum system during single realizations of an experiment in which the system's environment is monitored. In this chapter, we present an extension of stochastic thermodynamics to the case of open quantum systems, which builds on the analogy between the quantum trajectories and the trajectories in phase space of classical stochastic thermodynamics. We analyze entropy production, work and heat exchanges at the trajectory level, identifying genuinely quantum contributions due to decoherence induced by the environment. We present three examples: the thermalization of a quantum system, the fluorescence of a driven qubit and the continuous monitoring of a qubit's observable.
- May 23 2018 cond-mat.str-el arXiv:1805.08232v1We propose graphene based moiré super-lattice systems to realize nearly flat Chern bands. We study a number of systems of twisted double layers with small twist angle. Each layer is chosen to be either AB stacked bilayer graphene (BG), ABC stacked trilayer graphene (TG), or hexagonal boron nitride (h-BN) (specifically the following twisted systems: BG/h-BN, TG/h-BN, BG/BG, TG/TG, TG/BG.). In these systems a vertical applied electric field enables control of the bandwidth, and interestingly also the Chern number. We find that Chern numbers of the bands associated with each of the two microscopic valleys can be $\pm 0,\pm 1,\pm 2, \pm 3$ depending on the specific system and vertical electrical field. Fully filling these bands ($\nu_T = 4$ electrons/moiré site) leads to a Quantum Valley Hall state. At partial band filling, Coulomb interactions are expected to play an important role. At integer electron fillings (per moiré site) $\nu_T= 1, 2,3$, the natural states are spontaneous spin/valley polarized to give fully filled bands. This leads to Anomalous Integer Quantum (Valley) Hall states with Chern number $C \geq 1$. At fractional filling, Fractional Quantum Anomalous Hall insulators or Fractional Quantum Valley Hall insulators (both abelian and non-abelian) are possible when interaction is strong enough. %Besides a series of Abelian states, $SU(N)_C$ non-Abelian states are also possible candidates for fractional filling of bands with Chern number $C>1$. For Chern number $|C|=2$, unconventional superconductivity may arise from condensation of charge $2e$ bosonic skyrmions upon doping the spin and valley polarized Integer Quantum Anomalous Hall insulator with Hall conductivity $\sigma^c_{xy}=2\frac{e^2}{h}$ at filling $\nu_T= 1,3$. We also discuss conceptual similarities and implications for modeling twisted bilayer graphene systems.
- A Convolutional Feature Map based Deep Network targeted towards Traffic Detection and ClassificationMay 23 2018 cs.CV arXiv:1805.08769v1This research mainly emphasizes on traffic detection thus essentially involving object detection and classification. The particular work discussed here is motivated from unsatisfactory attempts of re-using well known pre-trained object detection networks for domain specific data. In this course, some trivial issues leading to prominent performance drop are identified and ways to resolve them are discussed. For example, some simple yet relevant tricks regarding data collection and sampling prove to be very beneficial. Also, introducing a blur net to deal with blurred real time data is another important factor promoting performance elevation. We further study the neural network design issues for beneficial object classification and involve shared, region-independent convolutional features. Adaptive learning rates to deal with saddle points are also investigated and an average covariance matrix based pre-conditioned approach is proposed. We also introduce the use of optical flow features to accommodate orientation information. Experimental results demonstrate that this results in a steady rise in the performance rate.
- Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using $\times 3531$ less bits or train it to a $1\%$ lower accuracy using $\times 37208$ less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.
- May 23 2018 math.NA arXiv:1805.08766v1In recent work we have developed a renormalization framework for stabilizing reduced order models for time-dependent partial differential equations. We have applied this framework to the open problem of finite-time singularity formation (blow-up) for the 3D Euler equations of incompressible fluid flow. The renormalized coefficients in the reduced order models decay algebraically with time and resolution. Our results for the behavior of the solutions are consistent with the formation of a finite-time singularity.
- Efficient computability is an important property of solution concepts in matching markets. We consider the computational complexity of finding and verifying various solution concepts in trading networks---multi-sided matching markets with bilateral contracts---under the assumption of full substitutability of agents' preferences. First, we show that outcomes that satisfy an economically intuitive solution concept---trail stability---always exist and can be found in linear time. Second, we consider a slightly stronger solution concept in which agents can simultaneously offer an upstream and a downstream contract. We show that deciding the existence of outcomes satisfying this solution concept is an NP-complete problem even in a special (flow network) case of our model. It follows that the existence of stable outcomes---immune to deviations by arbitrary sets of agents---is also an NP-complete problem in trading networks (and in flow networks). Finally, we show that even verifying whether a given outcome is stable is NP-complete in trading networks.
- May 23 2018 math.OC arXiv:1805.08756v1We prove that a simple Sequential Quadratic Programming (SQP) algorithm for equality constrained optimization has local linear convergence with rate $1 - 1/\kappa_R$, where $\kappa_R$ is the condition number of the Riemannian Hessian. Our analysis builds on insights from Riemannian optimization and indicates that first-order Riemannian algorithms and "simple" SQP algorithms have nearly identical local behavior. Unlike Riemannian algorithms, SQP avoids calculating the projection or retraction of the points onto the manifold for each iterates. All the SQP iterates will automatically be quadratically close the the manifold near the local minimizer.
- We study the fundamental problem of distributed energy-aware network formation with mobile agents of limited computational power that have the capability to wirelessly transmit and receive energy in a peer-to-peer manner. Specifically, we design simple distributed protocols consisting of a small number of states and interaction rules for the construction of both arbitrary and binary trees. Further, we theoretically and experimentally evaluate a plethora of energy redistribution protocols that exploit different levels of knowledge in order to achieve desired energy distributions which require, for instance, that every agent has twice the energy of the agents of higher depth (according to the tree network). Our study shows that without using any knowledge about the network structure, such energy distributions cannot be achieved in a timely manner, which means that there might be high energy loss during the redistribution process. On the other hand, only a few extra bits of information seem to be enough to guarantee quick convergence to energy distributions that satisfy particular properties, yielding low energy loss.
- In recent years, due to the booming development of online social networks, fake news for various commercial and political purposes has been appearing in large numbers and widespread in the online world. With deceptive words, online social network users can get infected by these online fake news easily, which has brought about tremendous effects on the offline society already. An important goal in improving the trustworthiness of information in online social networks is to identify the fake news timely. This paper aims at investigating the principles, methodologies and algorithms for detecting fake news articles, creators and subjects from online social networks and evaluating the corresponding performance. This paper addresses the challenges introduced by the unknown characteristics of fake news and diverse connections among news articles, creators and subjects. Based on a detailed data analysis, this paper introduces a novel automatic fake news credibility inference model, namely FakeDetector. Based on a set of explicit and latent features extracted from the textual information, FakeDetector builds a deep diffusive network model to learn the representations of news articles, creators and subjects simultaneously. Extensive experiments have been done on a real-world fake news dataset to compare FakeDetector with several state-of-the-art models, and the experimental results have demonstrated the effectiveness of the proposed model.
- Programming has been an important skill for researchers and practitioners in computer science and other related areas. To learn basic programing skills, a long-time systematic training is usually required for beginners. According to a recent market report, the computer software market is expected to continue expanding at an accelerating speed, but the market supply of qualified software developers can hardly meet such a huge demand. In recent years, the surge of text generation research works provides the opportunities to address such a dilemma through automatic program synthesis. In this paper, we propose to make our try to solve the program synthesis problem from a data mining perspective. To address the problem, a novel generative model, namely EgoCoder, will be introduced in this paper. EgoCoder effectively parses program code into abstract syntax trees (ASTs), where the tree nodes will contain the program code/comment content and the tree structure can capture the program logic flows. Based on a new unit model called Hsu, EgoCoder can effectively capture both the hierarchical and sequential patterns in the program ASTs. Extensive experiments will be done to compare EgoCoder with the state-of-the-art text generation methods, and the experimental results have demonstrated the effectiveness of EgoCoder in addressing the program synthesis problem.
- May 23 2018 math.AG arXiv:1805.08745v1This dissertation comprises three collections of results, all united by a common theme. The theme is the study of categories via algebraic techniques, considering categories themselves as algebraic objects. This algebraic approach to category theory is central to noncommutative algebraic geometry, as realized by recent advances in the study of noncommutative motives. We have success proving algebraic results in the general setting of symmetric monoidal and semiring $\infty$-categories, which categorify abelian groups and rings, respectively. For example, we prove that modules over the semiring category Fin of finite sets are cocartesian monoidal $\infty$-categories, and modules over Burn (the Burnside $\infty$-category) are additive $\infty$-categories. As a consequence, we can regard Lawvere theories as cyclic $\text{Fin}^\text{op}$-modules, leading to algebraic foundations for the higher categorical study of Lawvere theories. We prove that Lawvere theories function as a home for an algebraic Yoneda lemma. Finally, we provide evidence for a formal duality between naive and genuine equivariant homotopy theory, in the form of a group-theoretic Eilenberg-Watts Theorem. This sets up a parallel between equivariant homotopy theory and motivic homotopy theory, where Burnside constructions are analogous to Morita theory. We conjecture that this relationship could be made precise within the context of noncommutative motives over the field with one element.
- This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed between them to identify misclassified cases at run time and forward them to the high-precision unit or terminate computation. Experiments demonstrate that CascadeCNN achieves a performance boost of up to 55% for VGG-16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy.
- We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations. Our regularizer can be derived as a controlled approximation from first principles, leveraging the fundamental link between training with noise and regularization. It adds very little computational overhead during learning and is simple to implement generically in standard deep learning frameworks. Our experiments provide strong evidence that structured gradient regularization can act as an effective first line of defense against attacks based on low-level signal corruption.
- May 23 2018 math.LO arXiv:1805.08732v1Given a cardinal $\lambda$, category forcing axioms for $\lambda$-suitable classes $\Gamma$ are strong forcing axioms which completely decide the theory of the Chang model $\mathcal C_\lambda$, modulo generic extensions via forcing notions from $\Gamma$. $\mathsf{MM}^{+++}$ was the first category forcing axiom to be isolated (by the second author). In this paper we present, without proofs, a general theory of category forcings, and prove the existence of $\aleph_1$-many pairwise incompatible category forcing axioms for $\omega_1$-suitable classes.
- We consider a new stochastic gradient descent algorithm for efficiently solving general min-max optimization problems that arise naturally in distributionally robust learning. By focusing on the entire dataset, current approaches do not scale well. We address this issue by initially focusing on a subset of the data and progressively increasing this support to statistically cover the entire dataset.
- In recent years, the Word2Vec model trained with the Negative Sampling loss function has shown state-of-the-art results in a number of machine learning tasks, including language modeling tasks, such as word analogy and word similarity, and in recommendation tasks, through Prod2Vec, an extension that applies to modeling user shopping activity and user preferences. Several methods that aim to improve upon the standard Negative Sampling loss have been proposed. In our paper we pursue more sophisticated Negative Sampling, by leveraging ideas from the field of Generative Adversarial Networks (GANs), and propose Adversarial Negative Sampling. We build upon the recent progress made in stabilizing the training objective of GANs in the discrete data setting, and introduce a new GAN-Word2Vec model.We evaluate our model on the task of basket completion, and show significant improvements in performance over Word2Vec trained using standard loss functions, including Noise Contrastive Estimation and Negative Sampling.
- Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model. One of the two essential components of a PBDN is the development of a special infinite-wide single-hidden-layer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layer-wise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing state-of-the-art classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for out-of-sample prediction.
- May 23 2018 cs.SI arXiv:1805.08718v1This paper explores the use of language models to predict 20 human traits from users' Facebook status updates. The data was collected by the myPersonality project, and includes user statuses along with their personality, gender, political identification, religion, race, satisfaction with life, IQ, self-disclosure, fair-mindedness, and belief in astrology. A single interpretable model meets state of the art results for well-studied tasks such as predicting gender and personality; and sets the standard on other traits such as IQ, sensational interests, political identity, and satisfaction with life. Additionally, highly weighted words are published for each trait. These lists are valuable for creating hypotheses about human behavior, as well as for understanding what information a model is extracting. Using performance and extracted features we analyze models built on social media. The real world problems we explore include gendered classification bias and Cambridge Analytica's use of psychographic models.
- May 23 2018 cs.CV arXiv:1805.08717v1Reliable markerless motion tracking of multiple people participating in complex group activity from multiple handheld cameras is challenging due to frequent occlusions, strong viewpoint and appearance variations, and asynchronous video streams. The key to solving this problem is to reliably associate the same person across distant viewpoint and temporal instances. In this work, we combine motion tracking, mutual exclusion constraints, and multiview geometry in a multitask learning framework to automatically adapt a generic person appearance descriptor to the domain videos. Tracking is formulated as a spatiotemporally constrained clustering using the adapted person descriptor. Physical human constraints are exploited to reconstruct accurate and consistent 3D skeletons for every person across the entire sequence. We show significant improvement in association accuracy (up to 18%) in events with up to 60 people and 3D human skeleton reconstruction (5 to 10 times) over the baseline for events captured "in the wild".
- In this paper we consider a ranking problem in which we would like to order a set of items by utility or relevance, while also considering the visibility of different groups of items. To solve this problem, we adopt a supervised learning to rank approach that learns a ranking function from a set of training examples, which are queries and ranked lists of documents for each query. We consider that the elements to be ranked are divided into two groups: protected and non-protected. Following long-standing empirical observations showing that users of information retrieval systems rarely look past the first few results, we consider that some items receive more exposure than others. Our objective is to produce a ranker that is able to reproduce the ordering of the training set, which is the standard objective in learning to rank, but that additionally gives protected elements sufficient exposure, compared to non-protected elements. We demonstrate how to describe this objective formally, how to achieve it effectively and implement it, and present an experimental study describing how large differences in exposure can be reduced without having to introduce large distortions in the ranking utility.
- May 23 2018 physics.soc-ph physics.data-an arXiv:1805.08711v1The way the topological structure goes from a decoupled state into a coupled one in multiplex networks has been widely studied by means of analytical and numerical studies, involving models of artificial networks. In general, these experiments assume uniform interconnections between layers offering, on the one hand, an analytical treatment of the structural properties of multiplex networks but, on the other hand, loosing applicability to real networks where heterogeneity of the links' weights is an intrinsic feature. In this paper, we study 2-layer multiplex networks of musicians whose layers correspond to empirical datasets containing, and linking the information of: (i) collaboration between them and (ii) musical similarities. In our model, connections between the collaboration and similarity layers exist, but they are not ubiquitous for all nodes. Specifically, inter-layer links are created (and weighted) based on structural resemblances between the neighborhood of an artist, taking into account the level of interaction at each layer. Next, we evaluate the effect that the heterogeneity of the weights of the inter-layer links has on the structural properties of the whole network, namely the second smallest eigenvalue of the Laplacian matrix (algebraic connectivity). Our results show a transition in the value of the algebraic connectivity that is far from classical theoretical predictions where the weight of the inter-layer links is considered to be homogeneous.
- May 23 2018 cs.CL arXiv:1805.08701v1Building tools for code-mixed data is rapidly gaining popularity in the NLP research community as such data is exponentially rising on social media. Working with code-mixed data contains several challenges, especially due to grammatical inconsistencies and spelling variations in addition to all the previous known challenges for social media scenarios. In this article, we present a novel architecture focusing on normalizing phonetic typing variations, which is commonly seen in code-mixed data. One of the main features of our architecture is that in addition to normalizing, it can also be utilized for back-transliteration and word identification in some cases. Our model achieved an accuracy of 90.27% on the test data.
- May 23 2018 astro-ph.SR arXiv:1805.08697v1Eclipsing binary systems with pulsating components offer a unique possibility to accurately measure the most important parameters of pulsating stars, to study their evolution, and to test the pulsation theory. I will show what we can learn about the pulsating stars from the analysis of such systems and how we can do it. Special attention will be paid to the mass, radius, p-factor, and distance determination. Although the core of the method is based on the observations of double-lined eclipsing spectroscopic binaries, with the help of the pulsation theory, it is possible to measure absolute parameters for single-lined binaries also.
- May 23 2018 cs.CV arXiv:1805.08685v1Recent research has widely explored the problem of aesthetics assessment of images with generic content. However, few approaches have been specifically designed to predict the aesthetic quality of images containing human faces, which make up a massive portion of photos in the web. This paper introduces a method for aesthetic quality assessment of images with faces. We exploit three different Convolutional Neural Networks to encode information regarding perceptual quality, global image aesthetics, and facial attributes; then, a model is trained to combine these features to explicitly predict the aesthetics of images containing faces. Experimental results show that our approach outperforms existing methods for both binary, i.e. low/high, and continuous aesthetic score prediction on four different databases in the state-of-the-art.
- May 23 2018 cs.CV arXiv:1805.08676v1We propose a geometric convexity shape prior preservation method for variational level set based image segmentation methods. Our method is built upon the fact that the level set of a convex signed distanced function must be convex. This property enables us to transfer a complicated geometrical convexity prior into a simple inequality constraint on the function. An active set based Gauss-Seidel iteration is used to handle this constrained minimization problem to get an efficient algorithm. We apply our method to region and edge based level set segmentation models including Chan-Vese (CV) model with guarantee that the segmented region will be convex. Experimental results show the effectiveness and quality of the proposed model and algorithm.
- May 23 2018 astro-ph.IM arXiv:1805.08675v1Cherenkov light induced by radioactive decay products is one of the major sources of background light for deep-sea neutrino telescopes such as ANTARES. These decays are at the same time a powerful calibration source. Using data collected by the ANTARES neutrino telescope from mid 2008 to 2017, the time evolution of the photon detection efficiency of optical modules is studied. A modest loss of only 20% in 9 years is observed. The relative time calibration between adjacent modules is derived as well.
- Parameterizing the approximate posterior of a generative model with neural networks has become a common theme in recent machine learning research. While providing appealing flexibility, this approach makes it difficult to impose or assess structural constraints such as conditional independence. We propose a framework for learning representations that relies on Auto-Encoding Variational Bayes and whose search space is constrained via kernel-based measures of independence. In particular, our method employs the $d$-variable Hilbert-Schmidt Independence Criterion (dHSIC) to enforce independence between the latent representations and arbitrary nuisance factors. We show how to apply this method to a range of problems, including the problems of learning invariant representations and the learning of interpretable representations. We also present a full-fledged application to single-cell RNA sequencing (scRNA-seq). In this setting the biological signal in mixed in complex ways with sequencing errors and sampling effects. We show that our method out-performs the state-of-the-art in this domain.
- A regression method for proportional, or fractional, data with mixed effects is outlined, designed for analysis of datasets in which the outcomes have substantial weight at the bounds. In such cases a normal approximation is particularly unsuitable as it can result in incorrect inference. To resolve this problem, we employ a logistic regression model and then apply a bootstrap method to correct conservative confidence intervals. This paper outlines the theory of the method, and demonstrates its utility using simulated data. Working code for the R platform is provided through the package glmmboot, available on CRAN.
- May 23 2018 math.FA arXiv:1805.08667v1In this paper, we define and study Gevrey spaces in a subelliptic setting, that is, associated with a Hörmander family of vector fields and their corresponding sub-Laplacian. We show some natural relations between the various Gevrey spaces in this setting on general manifolds, and more particular properties on Lie groups with polynomial growth of the volume. In the case of the Heisenberg group and of $SU(2)$, we show that all our descriptions coincide.
- May 23 2018 math.AP arXiv:1805.08666v1In this manuscript we consider a non-local porous medium equation with non-local diffusion effects given by a fractional heat operator $\partial_t + (-\Delta)^s$ in three space dimensions for $3/4\le s\le 1$. Global in time existence of weak solutions is shown by employing a time semi-discretization of the equations, an energy inequality and a uniform estimate for the first moment of the solution to the discretized problem.
- We introduce a Bayesian Gaussian process latent variable model that explicitly captures spatial correlations in data using a parameterized spatial kernel and leveraging structure-exploiting algebra on the model covariance matrices for computational tractability. Inference is made tractable through a collapsed variational bound with similar computational complexity to that of the traditional Bayesian GP-LVM. Inference over partially-observed test cases is achieved by optimizing a "partially-collapsed" bound. Modeling high-dimensional time series systems is enabled through use of a dynamical GP latent variable prior. Examples imputing missing data on images and super-resolution imputation of missing video frames demonstrate the model.
- May 23 2018 cs.CL arXiv:1805.08660v1Multimodal affective computing, learning to recognize and interpret human affects and subjective information from multiple data sources, is still challenging because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract level, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utter-ance-level sentiment and emotion from text and audio data. Our introduced model outperforms the state-of-the-art approaches on published datasets and we demonstrated that our model is able to visualize and interpret the synchronized attention over modalities.