- Mar 16 2018 quant-ph arXiv:1803.05886v1We propose a scheme to perform a Non-Clifford gate on a logical qudit encoded in a pair of $Z_N$ parafermionic zero modes via the Aharonov Casher effect. This gate can be implemented by moving a half fluxon around the pair of parafermionic zero modes that can be realized in a two-dimensional set-up via existing proposals. The half fluxon can be created as a part of half fluxon/anti-half fluxon pair in long Josephson junction with chiral superconductors and then moved around the parafermionic zero modes. Supplementing this gate with braiding of parafermions provides the avenue for universal quantum computing with parafermions.
- Mar 16 2018 quant-ph arXiv:1803.05904v1For every $d \geq 2$, we present a generalization of the CHSH inequality with the property that maximal violation self-tests the maximally entangled state of local dimension $d$. This is the first example of a family of inequalities with this property. Moreover, we provide a conjecture for a family of inequalities generalizing the tilted CHSH inequalities, and we conjecture that for each pure bipartite entangled state there is an inequality in the family whose maximal violation self-tests it. All of these inequalities are inspired by the self-testing correlations of [Nat. Comm. 8, 15485 (2017)].
- The exponential growth or decay with time of the out-of-time-order commutator (OTOC) is one widely used diagnostic of many-body chaos in spatially-extended systems. In studies of many-body classical chaos, it has been noted that one can define a velocity-dependent Lyapunov exponent, $\lambda({\bf v})$, which is the growth or decay rate along "rays" at that velocity. We examine the behavior of $\lambda({\bf v})$ for a variety of many-body systems, both chaotic and integrable. The so-called light cone for the spreading of operators is defined by $\lambda({\bf \hat n}v_B({\bf \hat n}))=0$, with a generally direction-dependent "butterfly speed" $v_B({\bf \hat n})$. In spatially local systems, $\lambda(v)$ is negative outside the light cone where it takes the form $\lambda(v) \sim -(v-v_B)^{\alpha}$ near $v_b$, with the exponent $\alpha$ taking on various values over the range of systems we examine. The regime inside the light cone with positive Lyapunov exponents may only exist for classical, semi-classical or large-$N$ systems, but not for "fully quantum" chaotic systems with strong short-range interactions and local Hilbert space dimensions of order one.
- Mar 16 2018 math.QA arXiv:1803.05568v1We introduce the notion of a $\textit{reflection fusion category}$, which is a type of a $G$-crossed category generated by objects of Frobenius-Perron dimension $1$ and $\sqrt{p}$, where $p$ is an odd prime. We show that such categories correspond to orthogonal reflection groups over $\mathbb{F}_p$. This allows us to use the known classification of irreducible reflection groups over finite fields to classify irreducible reflection fusion categories.
- Mar 16 2018 quant-ph arXiv:1803.05538v1Classical control noise is ubiquitous in qubit devices, making its accurate spectral characterization essential for designing optimized error suppression strategies at the physical level. Here, we focus on multiplicative Gaussian amplitude control noise on a driven qubit sensor and show that sensing protocols using optimally band-limited Slepian modulation offer substantial benefit in realistic scenarios. Special emphasis is given to laying out the theoretical framework necessary for extending non-parametric multitaper spectral estimation to the quantum setting by highlighting key points of contact and differences with respect to the classical formulation. In particular, we introduce and analyze two approaches (adaptive vs. single-setting) to quantum multitaper estimation, and show how they provide a practical means to both identify fine spectral features not otherwise detectable by existing protocols and to obtain reliable prior estimates for use in subsequent parametric estimation, including high-resolution Bayesian techniques. We quantitatively characterize the performance of both single- and multitaper Slepian estimation protocols by numerically reconstructing representative spectral densities, and demonstrate their advantage over dynamical-decoupling noise spectroscopy approaches in reducing bias from spectral leakage as well as in compensating for aliasing effects while maintaining a desired sampling resolution.
- Mar 16 2018 quant-ph arXiv:1803.05891v1We study quantum frequency estimation for $N$ qubits subjected to independent Markovian noise, via strategies based on time-continuous monitoring of the environment. Both physical intuition and an extended convexity property of the quantum Fisher information (QFI) suggest that these strategies are more effective than the standard ones based on the measurement of the unconditional state after the noisy evolution. Here we focus on initial GHZ states and on parallel or transverse noise. For parallel noise, i.e. dephasing, we show that perfectly efficient time-continuous photo-detection allows to recover the unitary (noiseless) QFI, and thus to obtain a Heisenberg scaling for every value of the monitoring time. For finite detection efficiency, one falls back to the noisy standard quantum limit scaling, but with a constant enhancement due to an effective reduced dephasing. Also in the transverse noise case we obtain that the Heisenberg scaling is recovered for perfectly efficient detectors, and we find that both homodyne and photo-detection based strategies are optimal. For finite detectors efficiency, our numerical simulations show that, as expected, an enhancement can be observed, but we cannot give any conclusive statement regarding the scaling. We finally describe in detail the stable and compact numerical algorithm that we have developed in order to evaluate the precision of such time-continuous estimation strategies, and that may find application in other quantum metrology schemes.
- We establish a large deviation principle for the Kardar-Parisi-Zhang (KPZ) equation, providing precise control over the left tail of the height distribution for narrow wedge initial condition. Our analysis exploits an exact connection between the KPZ one-point distribution and the Airy point process -- an infinite particle Coulomb-gas which arises at the spectral edge in random matrix theory. We develop the large deviation principle for the Airy point process and use it to compute, in a straight-forward and assumption-free manner, the KPZ large deviation rate function in terms of an electrostatic problem (whose solution we evaluate). This method also applies to the half-space KPZ equation, showing that its rate function is half of the full-space rate function. In addition to these long-time estimates, we provide rigorous proof of finite-time tail bounds on the KPZ distribution which demonstrate a crossover between exponential decay with exponent $3$ (in the shallow left tail) to exponent $5/2$ (in the deep left tail). The full-space KPZ rate function agrees with the one computed in Sasorov et al. [J. Stat. Mech, 063203 (2017)] via a WKB approximation analysis of a non-local, non-linear integro-differential equation generalizing Painlevé II which Amir et al. [Comm. Pure Appl. Math. 64, 466 (2011)] related to the KPZ one-point distribution.
- Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the well-known single-unit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.
- Mar 16 2018 cs.LG arXiv:1803.05591v1Momentum based stochastic gradient methods such as heavy ball (HB) and Nesterov's accelerated gradient descent (NAG) method are widely used in practice for training deep networks and other supervised learning models, as they often provide significant improvements over stochastic gradient descent (SGD). Rigorously speaking, "fast gradient" methods have provable improvements over gradient descent only for the deterministic case, where the gradients are exact. In the stochastic case, the popular explanations for their wide applicability is that when these fast gradient methods are applied in the stochastic case, they partially mimic their exact gradient counterparts, resulting in some practical gain. This work provides a counterpoint to this belief by proving that there exist simple problem instances where these methods cannot outperform SGD despite the best setting of its parameters. These negative problem instances are, in an informal sense, generic; they do not look like carefully constructed pathological instances. These results suggest (along with empirical evidence) that HB or NAG's practical performance gains are a by-product of mini-batching. Furthermore, this work provides a viable (and provable) alternative, which, on the same set of problem instances, significantly improves over HB, NAG, and SGD's performance. This algorithm, referred to as Accelerated Stochastic Gradient Descent (ASGD), is a simple to implement stochastic algorithm, based on a relatively less popular variant of Nesterov's Acceleration. Extensive empirical results in this paper show that ASGD has performance gains over HB, NAG, and SGD.
- Mar 16 2018 quant-ph arXiv:1803.05586v1The aim of this book chapter is to indicate how quantum phenomena are affecting the operation of microscopic thermal machines, such as engines and refrigerators. As converting heat to work is one of the fundamental concerns in thermodynamics, the platform of quantum-thermal machines sheds light on thermodynamics in the quantum regime. This chapter focuses on the basic features of quantum mechanics, such as energy quantization, the uncertainty principle, quantum coherence and correlations, and their manifestation in microscopic thermal devices. In addition to indicating the peculiar behaviors of thermal-machines due to their non-classical features, we present quantum-thermodynamic signatures of these machines. Any violation of the classical bounds on thermodynamic measurements of these machines is a sufficient condition to conclude that quantum effects are present in the operation of that thermal machine. Experimental setups demonstrating some of the results are also presented.
- Mar 16 2018 math.AP arXiv:1803.05560v1We study the stationary Stokes system in divergence form. The coefficients are assumed to be merely measurable in one direction and have Dini mean oscillations in the other directions. We prove that if $(u,p)$ is a weak solution of the system, then $(Du,p)$ is bounded and its certain linear combinations are continuous. We also prove a weak type-$(1,1)$ estimate for $(Du,p)$ under a stronger assumption on the $L^1$-mean oscillation of the coefficients. The corresponding results up to the boundary on a half ball are also established. These results are new even for elliptic equations and systems.
- A bipartite graph $G(U,V;E)$ that admits a perfect matching is given. One player imposes a permutation $\pi$ over $V$, the other player imposes a permutation $\sigma$ over $U$. In the greedy matching algorithm, vertices of $U$ arrive in order $\sigma$ and each vertex is matched to the lowest (under $\pi$) yet unmatched neighbor in $V$ (or left unmatched, if all its neighbors are already matched). The obtained matching is maximal, thus matches at least a half of the vertices. The max-min greedy matching problem asks: suppose the first (max) player reveals $\pi$, and the second (min) player responds with the worst possible $\sigma$ for $\pi$, does there exist a permutation $\pi$ ensuring to match strictly more than a half of the vertices? Can such a permutation be computed in polynomial time? The main result of this paper is an affirmative answer for this question: we show that there exists a polytime algorithm to compute $\pi$ for which for every $\sigma$ at least $\rho > 0.51$ fraction of the vertices of $V$ are matched. We provide additional lower and upper bounds for special families of graphs, including regular and Hamiltonian. Interestingly, even for regular graphs with arbitrarily large degree (implying a large number of disjoint perfect matchings), there is no $\pi$ ensuring to match more than a fraction $8/9$ of the vertices. The max-min greedy matching problem solves an open problem regarding the welfare guarantees attainable by pricing in sequential markets with binary unit-demand valuations. In addition, it has implications for the size of the unique stable matching in markets with global preferences, subject to the graph structure.
- Mar 16 2018 cs.SE arXiv:1803.05889v1The ever-growing popularity of mobile phones has brought additional challenges to the software development lifecycle. Mobile applications (apps, for short) ought to provide the same set of features as conventional software, with limited resources: such as, limited processing capabilities, storage, screen and, not less important, power source. Although energy efficiency is a valuable requirement, developers often lack knowledge of best practices. In this paper, we study whether or not automatic refactoring can aid developers ship energy efficient apps. We leverage a tool, Leafactor, with five energy code smells that tend to go unnoticed. We use Leafactor to analyze code smells in 140 free and open source apps. As a result, we detected and fixed code smells in 45 apps, from which 40% have successfully merged our changes into the official repository.
- It is now established that nuclear quantum motion plays an important role in determining water's hydrogen bonding, structure, and dynamics. Such effects are important to consider when evaluating DFT functionals and attempting to develop better ones for water. The standard way of treating nuclear quantum effects, path integral molecular dynamics (PIMD), multiplies the number of energy/force calculations by the number of beads, which is typically 32 for room temperature water. Here we introduce a method whereby PIMD can be incorporated into a DFT molecular dynamics simulation with very little extra cost. The method is based on the many body expansion of the energy. We first subtract the DFT monomer energies & forces using a custom DFT-based monomer potential energy surface. The evolution of the PIMD beads is then performed using only the highly accurate Partridge-Schwenke monomer energy surface. DFT calculations are done using the centroid positions. We explore the relation between our method to multiple timestep algorithms, bead contraction, and other schemes that have been introduced to speed up PIMD. We show that our method, which we call "monomer PIMD" correctly captures the structure and nuclear delocalization of water found in full PIMD simulation but at much lower computational cost.
- Mar 16 2018 hep-ph arXiv:1803.05901v1Cosmic-ray interactions with the nuclei of the Earth's atmosphere produce a flux of neutrinos in all directions with energies extending above the TeV scale. However, the Earth is not a fully transparent medium for neutrinos with energies above a few TeV. At these energies, the charged-current neutrino-nucleon cross section is large enough so that the neutrino mean-free path in a medium with the Earth's density is comparable to the Earth's diameter. Therefore, when neutrinos of these energies cross the Earth, there is a non-negligible probability for them to be absorbed. Since this effect depends on the distance traveled by neutrinos and on their energy, studying the zenith and energy distributions of TeV atmospheric neutrinos passing through the Earth offers an opportunity to infer the Earth's density profile. Here we perform an Earth tomography with neutrinos using actual data, the publicly available one-year through-going muon sample of the atmospheric neutrino data of the IceCube neutrino telescope. We are able to determine the mass of the Earth, its moment of inertia, the mass of the Earth's core and to establish the core is denser than the mantle, using weak interactions only, in a way completely independent from gravitational measurements. Our results confirm that this can be achieved with current neutrino detectors. This method to study the Earth's internal structure, complementary to the traditional one from geophysics based on seismological data, is starting to provide useful information and it could become competitive as soon as more statistics is available thanks to the current and larger future neutrino detectors.
- In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.
- Mar 16 2018 cs.CG arXiv:1803.05893v1We show that the problem of guarding an $x$-monotone terrain from an altitude line and the problem of guarding a uni-monotone polygon are equivalent. We present a polynomial time algorithm for both problems, and show that the cardinality of a minimum guard set and the cardinality of a maximum witness set coincide. Thus, uni-monotone polygons are perfect; this result also extends to monotone mountains.
- In this paper, we present GossipGraD - a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from \Theta(log(p)) for p compute nodes in well-studied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent over-fitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights Landing (KNL) connected with Aries network. We evaluate GossipGraD using well-studied dataset ImageNet-1K (~250GB), and widely studied neural network topologies such as GoogLeNet and ResNet50 (current winner of ImageNet Large Scale Visualization Research Challenge (ILSVRC)). Our performance evaluation using both KNL and Pascal GPUs indicates that GossipGraD can achieve perfect efficiency for these datasets and their associated neural network topologies. Specifically, for ResNet50, GossipGraD is able to achieve ~100% compute efficiency using 128 NVIDIA Pascal P100 GPUs - while matching the top-1 classification accuracy published in literature.
- Mar 16 2018 cond-mat.mes-hall cond-mat.soft arXiv:1803.05877v1We consider the orientational alignment of dipoles due to strong matter light coupling, for a non-vanishing density of excitations. We compare various approaches to this problem in the limit of large numbers of emitters, and show that direct Monte Carlo integration, mean-field theory, and large deviation methods match exactly in this limit. All three results show that orientational alignment develops in the presence of a macroscopically occupied polariton mode, and that the dipoles asymptotically approach perfect alignment in the limit of high density or low temperature.
- Mar 16 2018 cs.CV arXiv:1803.05873v1Facial expressions are combinations of basic components called Action Units (AU). Recognizing AUs is key for developing general facial expression analysis. In recent years, most efforts in automatic AU recognition have been dedicated to learning combinations of local features and to exploiting correlations between Action Units. In this paper, we propose a deep neural architecture that tackles both problems by combining learned local and global features in its initial stages and replicating a message passing algorithm between classes similar to a graphical model inference approach in later stages. We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3\% and 8.2\% performance on BP4D and DISFA datasets, respectively.
- Mar 16 2018 cs.CV arXiv:1803.05872v1In this paper we introduce an ensemble method for convolutional neural network (CNN), called "virtual branching," which can be implemented with nearly no additional parameters and computation on top of standard CNNs. We propose our method in the context of person re-identification (re-ID). Our CNN model consists of shared bottom layers, followed by "virtual" branches, where neurons from a block of regular convolutional and fully-connected layers are partitioned into multiple sets. Each virtual branch is trained with different data to specialize in different aspects, e.g., a specific body region or pose orientation. In this way, robust ensemble representations are obtained against human body misalignment, deformations, or variations in viewing angles, at nearly no any additional cost. The proposed method achieves competitive performance on multiple person re-ID benchmark datasets, including Market-1501, CUHK03, and DukeMTMC-reID.
- Mar 16 2018 stat.ML arXiv:1803.05867v1Scientific fields such as insider-threat detection and highway-safety planning often lack sufficient amounts of time-series data to estimate statistical models for the purpose of scientific discovery. Moreover, the available limited data are quite noisy. This presents a major challenge when estimating time-series models that are robust to overfitting and have well-calibrated uncertainty estimates. Most of the current literature in these fields involve visualizing the time-series for noticeable structure and hard coding them into pre-specified parametric functions. This approach is associated with two limitations. First, given that such trends may not be easily noticeable in small data, it is difficult to explicitly incorporate expressive structure into the models during formulation. Second, it is difficult to know $\textit{a priori}$ the most appropriate functional form to use. To address these limitations, a nonparametric Bayesian approach was proposed to implicitly capture hidden structure from time series having limited data. The proposed model, a Gaussian process with a spectral mixture kernel, precludes the need to pre-specify a functional form and hard code trends, is robust to overfitting and has well-calibrated uncertainty estimates.
- Mar 16 2018 cond-mat.mes-hall arXiv:1803.05865v1The two-temperature model (2TM) introduced by Kaganov, Lifshitz, and Tanatarov is widely used to describe the energy relaxation in the electron-phonon system of a metallic film. At the same time, the accuracy of the description of the electron-phonon system in terms of 2TM, i.e. on the basis of the electron and phonon temperatures, has not been considered in detail until now. In this paper we present a microscopic theory of cooling of instantly heated electrons in metallic films. In framework of this theory the main features of electron cooling in thick and thin films were found, and an analysis of the accuracy of the 2TM in the low-temperature region was carried out. We consider a more accurate three-temperature model, which (in contrast to 2TM) explicitly takes into account phonons with angles of incidence exceeding the angle of total internal reflection. The contribution of these phonons to the kinetics of electron relaxation can be significant if the sound velocities in the film and the substrate are quite different.
- Mar 16 2018 cs.CV arXiv:1803.05863v1For lossy image compression systems, we develop an algorithm called iterative refinement, to improve the decoder's reconstruction compared with standard decoding techniques. Specifically, we propose a recurrent neural network approach for nonlinear, iterative decoding. Our neural decoder, which can work with any encoder, employs self-connected memory units that make use of both causal and non-causal spatial context information to progressively reduce reconstruction error over a fixed number of steps. We experiment with variations of our proposed estimator and obtain as much as a 0.8921 decibel (dB) gain over the standard JPEG algorithm and a 0.5848 dB gain over a state-of-the-art neural compression model.
- Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network's ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection.
- Mar 16 2018 cs.CV arXiv:1803.05858v1In this work, we present a novel and effective framework to facilitate object detection with the instance-level segmentation information that is only supervised by bounding box annotation. Starting from the joint object detection and instance segmentation network, we propose to recursively estimate the pseudo ground-truth object masks from the instance-level object segmentation network training, and then enhance the detection network with top-down segmentation feedbacks. The pseudo ground truth mask and network parameters are optimized alternatively to mutually benefit for each other. To obtain the promising pseudo masks in each iteration, we embed a graphical inference that incorporates the low-level image appearance consistency and the bounding box annotations to refine the segmentation masks predicted by the segmentation network. Our approach progressively improves the object detection performance by incorporating the detailed pixel-wise information learned from the weakly-supervised segmentation network. Extensive evaluation on the detection task in PASCAL VOC 2007 and 2012 [12] verifies that the proposed approach is effective.
- Mar 16 2018 astro-ph.HE arXiv:1803.05856v1Gamma-ray burst (GRB) jets are narrow, and thus typically point away from us. They are initially ultra-relativistic, causing their prompt $\gamma$-ray and early afterglow emission to be beamed away from us. However, as the jet gradually decelerates its beaming cone widens and eventually reaches our line of sight and the afterglow emission may be detected. Such orphan afterglows were not clearly detected so far. Nevertheless, they should be detected in upcoming optical or radio surveys, and it would be challenging to clearly distinguish between them and other types of transients. Therefore, we perform detailed, realistic calculations of the expected afterglow emission from GRB jets viewed at different angles from the jet's symmetry axis. The dynamics are calculated using 2D relativistic hydrodynamics simulations of jets propagating into different power-law external density profiles, $\rho_{\rm ext}\propto{}R^{-k}$ for $k=0,\,1,\,1.5,\,2$, ranging from a uniform ISM-like medium ($k=0$) to a stratified steady stellar-wind like profile ($k=2$). We calculate radio, optical and X-ray lightcurves, and the evolution of the radio afterglow image size, shape and flux centroid. This may help identify misaligned relativistic jets, whether initially ultra-relativistic and producing a GRB for observers within their beam, or (possibly intrinsically more common) moderately relativistic, in either (i) nearby supernovae Ib/c (some of which are associated with long duration GRBs), or (ii) in binary neutron star mergers, which may produce short duration GRBs, and may also be detected in gravitational waves (e.g. GW$\,$170827/GRB$\,$170817A with a weak prompt $\gamma$-ray emission may harbor an off-axis jet).
- Mar 16 2018 q-fin.MF arXiv:1803.05831v1We introduce a new approach to incorporate uncertainty into the decision to invest in a commodity reserve. The investment is an irreversible one-off capital expenditure, after which the investor receives a stream of cashflow from extracting the commodity and selling it on the spot market. The investor is exposed to price uncertainty and uncertainty in the amount of available resources in the reserves (i.e. technical uncertainty). She does, however, learn about the reserve levels through time, which is a key determinant in the decision to invest. To model the reserve level uncertainty and how she learns about the estimates of the commodity in the reserve, we adopt a continuous-time Markov chain model to value the option to invest in the reserve and investigate the value that learning has prior to investment.
- Feature learning on point clouds has shown great promise, with the introduction of effective and generalizable deep learning frameworks such as pointnet++. Thus far, however, point features have been abstracted in an independent and isolated manner, ignoring the relative layout of neighboring points as well as their features. In the present article, we propose to overcome this limitation by using spectral graph convolution on a local graph, combined with a novel graph pooling strategy. In our approach, graph convolution is carried out on a nearest neighbor graph constructed from a point's neighborhood, such that features are jointly learned. We replace the standard max pooling step with a recursive clustering and pooling strategy, devised to aggregate information from within clusters of nodes that are close to one another in their spectral coordinates, leading to richer overall feature descriptors. Through extensive experiments on diverse datasets, we show a consistent demonstrable advantage for the tasks of both point set classification and segmentation.
- Mar 16 2018 cs.CL arXiv:1803.05820v1The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such measures was ever performed for the Russian language. Exploring this problem for the Russian language is even more interesting, because this language has features, such as rich morphology and free word order, which make it significantly different from English, German, and other well-studied languages. We attempt to bridge this gap by proposing a shared task on the semantic similarity of Russian nouns. Our key contribution is an evaluation methodology based on four novel benchmark datasets for the Russian language. Our analysis of the 105 submissions from 19 teams reveals that successful approaches for English, such as distributional and skip-gram models, are directly applicable to Russian as well. On the one hand, the best results in the contest were obtained by sophisticated supervised models that combine evidence from different sources. On the other hand, completely unsupervised approaches, such as a skip-gram model estimated on a large-scale corpus, were able score among the top 5 systems.
- We extend the idea of end-to-end learning of communications systems through deep neural network (NN)-based autoencoders to orthogonal frequency division multiplexing (OFDM) with cyclic prefix (CP). Our implementation has the same benefits as a conventional OFDM system, namely singletap equalization and robustness against sampling synchronization errors, which turned out to be one of the major challenges in previous single-carrier implementations. This enables reliable communication over multipath channels and makes the communication scheme suitable for commodity hardware with imprecise oscillators. We show that the proposed scheme can be realized with state-of-the-art deep learning software libraries as transmitter and receiver solely consist of differentiable layers required for gradient-based training. We compare the performance of the autoencoder-based system against that of a state-of-the-art OFDM baseline over frequency-selective fading channels. Finally, the impact of a non-linear amplifier is investigated and we show that the autoencoder inherently learns how to deal with such hardware impairments.
- Mar 16 2018 cs.LG arXiv:1803.05814v1We present data-dependent learning bounds for the general scenario of non-stationary non-mixing stochastic processes. Our learning guarantees are expressed in terms of a data-dependent measure of sequential complexity and a discrepancy measure that can be estimated from data under some mild assumptions. We also also provide novel analysis of stable time series forecasting algorithm using this new notion of discrepancy that we introduce. We use our learning bounds to devise new algorithms for non-stationary time series forecasting for which we report some preliminary experimental results.
- We show that a natural discretisation of Virasoro algebra yields a quantum integrable model which is the Toda chain in the second Hamiltonian structure.
- Mar 16 2018 math.NT arXiv:1803.05800v1The purpose of this note is twofold. First, we survey results on the construction of large class groups of number fields by specialization of finite covers of curves. Then we give examples of applications of these techniques.
- Object ranking is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects, which are typically represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. Current approaches commonly focus on ranking by scoring, i.e., on learning an underlying latent utility function that seeks to capture the inherent utility of each object. These approaches, however, are not able to take possible effects of context-dependence into account, where context-dependence means that the utility or usefulness of an object may also depend on what other objects are available as alternatives. In this paper, we formalize the problem of context-dependent ranking and present two general approaches based on two natural representations of context-dependent ranking functions. Both approaches are instantiated by means of appropriate neural network architectures. We demonstrate empirically that our methods outperform traditional approaches on benchmark tasks, for which context-dependence is playing a relevant role.
- Mar 16 2018 cs.CL arXiv:1803.05795v1The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and free word order. The participants were asked to group contexts with a given word in accordance with its senses that were not provided beforehand. For instance, given a word "bank" and a set of contexts with this word, e.g. "bank is a financial institution that accepts deposits" and "river bank is a slope beside a body of water", a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the "company" and the "area" senses of the word "bank". For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings.
- This paper introduces a general class of hierarchical nonparametric prior distributions. The random probability measures are constructed by a hierarchy of generalized species sampling processes with possibly non-diffuse base measures. The proposed framework provides a general probabilistic foundation for hierarchical random measures with either atomic or mixed base measures and allows for studying their properties, such as the distribution of the marginal and total number of clusters. We show that hierarchical species sampling models have a Chinese Restaurants Franchise representation and can be used as prior distributions to undertake Bayesian nonparametric inference. We provide a method to sample from the posterior distribution together with some numerical illustrations. Our class of priors includes some new hierarchical mixture priors such as the hierarchical Gnedin measures, and other well-known prior distributions such as the hierarchical Pitman-Yor and the hierarchical normalized random measures.
- Mar 16 2018 cs.CV arXiv:1803.05790v1We present an effective dynamic clustering algorithm for the task of temporal human action segmentation, which has comprehensive applications such as robotics, motion analysis, and patient monitoring. Our proposed algorithm is unsupervised, fast, generic to process various types of features, and applicable in both the online and offline settings. We perform extensive experiments of processing data streams, and show that our algorithm achieves the state-of-the-art results for both online and offline settings.
- Mar 16 2018 math.RA arXiv:1803.05780v1We define new generalized factorials in several variables over an arbitrary subset $\underline{S} \subseteq R^n,$ where $R$ is a Dedekind domain and $n$ is a positive integer. We then study the properties of the fixed divisor $d(\underline{S},f)$ of a multivariate polynomial $f \in R[x_1,x_2, \ldots, x_n]$. We generalize the results of Polya, Bhargava, Gunji & McQuillan and strengthen that of Evrard, all of which relate the fixed divisor to generalized factorials of $\underline{S}$. We also express $d(\underline{S},f)$ in terms of the images $f(\underline{a})$ of finitely many elements $\underline{a} \in R^n$, generalizing a result of Hensel, and in terms of the coefficients of $f$ under explicit bases.
- Mar 16 2018 stat.ML arXiv:1803.05776v1We propose Gaussian processes for signals over graphs (GPG) using the apriori knowledge that the target vectors lie over a graph. We incorporate this information using a graph- Laplacian based regularization which enforces the target vectors to have a specific profile in terms of graph Fourier transform coeffcients, for example lowpass or bandpass graph signals. We discuss how the regularization affects the mean and the variance in the prediction output. In particular, we prove that the predictive variance of the GPG is strictly smaller than the conventional Gaussian process (GP) for any non-trivial graph. We validate our concepts by application to various real-world graph signals. Our experiments show that the performance of the GPG is superior to GP for small training data sizes and under noisy training.
- We consider the problem of predicting plausible missing facts in relational data, given a set of imperfect logical rules. In particular, our aim is to provide bounds on the (expected) number of incorrect inferences that are made in this way. Since for classical inference it is in general impossible to bound this number in a non-trivial way, we consider two inference relations that weaken, but remain close in spirit to classical inference.
- Mar 16 2018 cs.CV arXiv:1803.05759v1Saliency prediction is a well studied problem in computer vision. Early saliency models were based on low-level hand-crafted feature derived from insights gained in neuroscience and psychophysics. In the wake of deep learning breakthrough, a new cohort of models were proposed based on neural network architectures, allowing significantly higher gaze prediction than previous shallow models, on all metrics. However, most models treat the saliency prediction as a \textitregression problem, and accurate regression of high-dimensional data is known to be a hard problem. Furthermore, it is unclear that intermediate levels of saliency (ie, neither very high, nor very low) are meaningful: Something is either salient, or it is not. Drawing from those two observations, we reformulate the saliency prediction problem as a salient region \textitsegmentation problem. We demonstrate that the reformulation allows for faster convergence than the classical regression problem, while performance is comparable to state-of-the-art. We also visualise the general features learned by the model, which are showed to be consistent with insights from psychophysics.
- Mar 16 2018 math.NT arXiv:1803.05758v1A few years ago new quantitative measures of pseudorandomness of binary sequences have been introduced. Since that these measures have been studied in many papers and many constructions have been given along these lines. In this paper the connection between the new measures and the NIST tests is analyzed. It is shown that finite binary sequences possessing strong pseudorandom properties in terms of these new measures usually also pass or nearly pass most of the NIST tests.
- Mar 16 2018 cs.CV arXiv:1803.05753v1Deep convolutional neural networks have demonstrated high performances for fixation prediction in recent years. How they achieve this, however, is less explored and they remain to be black box models. Here, we attempt to shed light on the internal structure of deep saliency models and study what features they extract for fixation prediction. Specifically, we use a simple yet powerful architecture, consisting of only one CNN and a single resolution input, combined with a new loss function for pixel-wise fixation prediction during free viewing of natural scenes. We show that our simple method is on par or better than state-of-the-art complicated saliency models. Furthermore, we propose a method, related to saliency model evaluation metrics, to visualize deep models for fixation prediction. Our method reveals the inner representations of deep models for fixation prediction and provides evidence that saliency, as experienced by humans, is likely to involve high-level semantic knowledge in addition to low-level perceptual cues. Our results can be useful to measure the gap between current saliency models and the human inter-observer model and to build new models to close this gap.
- Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance.
- Mar 16 2018 cs.CV arXiv:1803.05748v1Many computer vision pipelines involve dynamic programming primitives such as finding a shortest path or the minimum energy solution in a tree-shaped probabilistic graphical model. In such cases, extracting not merely the best, but the set of M-best solutions is useful to generate a rich collection of candidate proposals that can be used in downstream processing. In this work, we show how M-best solutions of tree-shaped graphical models can be obtained by dynamic programming on a special graph with M layers. The proposed multi-layer concept is optimal for searching M-best solutions, and so flexible that it can also approximate M-best diverse solutions. We illustrate the usefulness with applications to object detection, panorama stitching and centerline extraction. Note: We have observed that an assumption in section 4 of our paper is not always fulfilled, see the attached corrigendum for details.
- Mar 16 2018 cs.CV arXiv:1803.05729v1While the research on convolutional neural networks (CNNs) is progressing quickly, the real-world deployment of these models is often limited by computing resources and memory constraints. In this paper, we address this issue by proposing a novel filter pruning method to compress and accelerate CNNs. Our work is based on the linear relationship identified in different feature map subspaces via visualization of feature maps. Such linear relationship implies that the information in CNNs is redundant. Our method eliminates the redundancy in convolutional filters by applying subspace clustering to feature maps. In this way, most of the representative information in the network can be retained in each cluster. Therefore, our method provides an effective solution to filter pruning for which most existing methods directly remove filters based on simple heuristics. The proposed method is independent of the network structure, thus it can be adopted by any off-the-shelf deep learning libraries. Experiments on different networks and tasks show that our method outperforms existing techniques before fine-tuning, and achieves the state-of-the-art results after fine-tuning.
- Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to find one with quality guarantees. To this end, we propose an interactive HUman and Machine cOoperation framework for ER, denoted by i-HUMO. Similar to the existing HUMO framework, i-HUMO enforces both precision and recall levels by dividing an ER workload between the human and the machine. It essentially makes the machine label easy instances while assigning more challenging instances to the human. However, i-HUMO is a major improvement over HUMO in that it is interactive: its process of human workload selection is optimized based on real-time risk analysis on human-labeled results as well as pre-specified machine metrics. In this paper, we first introduce the i-HUMO framework and then present the risk analysis technique to prioritize the instances for manual labeling. Finally, we empirically evaluate i-HUMO's performance on real data. Our extensive experiments show that i-HUMO is effective in enforcing quality guarantees, and compared with the state-of-the-art alternatives, it can achieve better quality control with reduced human cost.
- Mar 16 2018 cond-mat.stat-mech arXiv:1803.05708v1The typical values and fluctuations of time-integrated observables of nonequilibrium processes driven in steady states are known to be characterized by large deviation functions, generalizing the entropy and free energy to nonequilibrium systems. The definition of these functions involves a scaling limit, similar to the thermodynamic limit, in which the integration time $\tau$ appears linearly, unless the process considered has long-range correlations, in which case $\tau$ is generally replaced by $\tau^\xi$ with $\xi\neq 1$. Here we show that such an anomalous power-law scaling in time of large deviations can also arise without long-range correlations in Markovian processes as simple as the Langevin equation. We describe the mechanism underlying this scaling using path integrals and discuss its physical consequences for more general processes.
- Mar 16 2018 math.AP arXiv:1803.05687v1We present a weighted version of the Caffarelli-Kohn-Nirenberg inequality in the framework of variable exponents. The combination of this inequality with a variant of the fountain theorem, yields the existence of infinitely many solutions for a class of non-homogeneous problems with Dirichlet boundary condition.