- We apply and extend the theory of universal recovery channels from quantum information theory to address the problem of entanglement wedge reconstruction in AdS/CFT. It has recently been proposed that any low-energy local bulk operators in a CFT boundary region's entanglement wedge can be reconstructed on that boundary region itself. Existing work arguing for this proposal relies on algebraic consequences of the exact equivalence between bulk and boundary relative entropies, namely the theory of operator algebra quantum error correction. However, bulk and boundary relative entropies are only approximately equal in bulk effective field theory, and in similar situations it is known that predictions from exact entropic equalities can be qualitatively incorrect. The framework of universal recovery channels provides a robust demonstration of the entanglement wedge reconstruction conjecture in addition to new physical insights. Most notably, we find that a bulk operator acting in a given boundary region's entanglement wedge can be expressed as the response of the boundary region's modular Hamiltonian to a perturbation of the bulk state in the direction of the bulk operator. This formula can be interpreted as a noncommutative version of Bayes' rule that attempts to undo the noise induced by restricting to only a portion of the boundary, and has an integral representation in terms of modular flows. We illustrate the application of our formula in the 2+1 dimensional AdS-Rindler case, finding that it expresses local bulk operators in the AdS-Rindler wedge in terms of modes in the corresponding boundary region. To reach these conclusions, we extend the theory of universal recovery channels to finite-dimensional operator algebras and demonstrate that recovery channels approximately preserve the multiplicative structure of the operator algebra.
- We study lower bounds on the optimal error probability in classical coding over classical-quantum channels at rates below the capacity, commonly termed quantum sphere-packing bounds. Winter and Dalai have derived such bounds for classical-quantum channels; however, the exponents in their bounds only coincide when the channel is classical. In this paper, we show that these two exponents admit a variational representation and are related by the Golden-Thompson inequality, reaffirming that Dalai's expression is stronger in general classical-quantum channels. Second, we establish a sphere-packing bound for classical-quantum channels, which significantly improves Dalai's prefactor from the order of subexponential to polynomial. Furthermore, the gap between the obtained error exponent for constant composition codes and the best known classical random coding exponent vanishes in the order of $o(\log n / n)$, indicating our sphere-packing bound is almost exact in the high rate regime. Finally, for a special class of symmetric classical-quantum channels, we can completely characterize its optimal error probability without the constant composition code assumption. The main technical contributions are two converse Hoeffding bounds for quantum hypothesis testing and the saddle-point properties of error exponent functions.
- Holant problems are a family of counting problems on graphs, parametrised by sets of complex-valued functions of Boolean inputs. Holant^c denotes a subfamily of those problems, where any function set considered must contain the two unary functions pinning inputs to values 0 or 1. The complexity classification of Holant problems usually takes the form of dichotomy theorems, showing that for any set of functions in the family, the problem is either #P-hard or it can be solved in polynomial time. Previous such results include a dichotomy for real-valued Holant^c and one for Holant^c with complex symmetric functions. Here, we derive a dichotomy theorem for Holant^c with complex-valued, not necessarily symmetric functions. The tractable cases are the complex-valued generalisations of the tractable cases of the real-valued Holant^c dichotomy. The proof uses results from quantum information theory, particularly about entanglement.
- We consider quantum information tasks in an operator algebraic setting, where we consider normal states on von Neumann algebras. In particular, we consider subfactors $\mathfrak{N} \subset \mathfrak{M}$, that is, unital inclusions of von Neumann algebras with trivial center. One can ask the following question: given a normal state $\omega$ on $\mathfrak{M}$, how much can one learn by only doing measurements from $\mathfrak{N}$? We argue how the Jones index $[\mathfrak{M}:\mathfrak{N}]$ can be used to give a quantitative answer to this, showing how the rich theory of subfactors can be used in a quantum information context. As an example we discuss how the Jones index can be used in the context of wiretap channels. Subfactors also occur naturally in physics. Here we discuss two examples: rational conformal field theories and Kitaev's toric code on the plane, a prototypical example of a topologically ordered model. There we can directly relate aspects of the general setting to physical properties such as the quantum dimension of the excitations. In the example of the toric code we also show how we can calculate the index via an approximation with finite dimensional systems. This explicit construction sheds more light on the connection between topological order and the Jones index.
- Apr 20 2017 quant-ph arXiv:1704.05469v2Correlations between spacelike separated measurements on entangled quantum systems are stronger than any classical correlations and are at the heart of numerous quantum technologies. In practice, however, spacelike separation is often not guaranteed and we typically face situations where measurements have an underlying time order. Here we aim to provide a fair comparison of classical and quantum models of temporal correlations on a single particle, as well as timelike-separated correlations on multiple particles. We use a causal modeling approach to show, in theory and experiment, that quantum correlations outperform their classical counterpart when allowed equal, but limited communication resources. This provides a clearer picture of the role of quantum correlations in timelike separated scenarios, which play an important role in foundational and practical aspects of quantum information processing.
- Apr 20 2017 quant-ph arXiv:1704.05734v2The term Einstein-Podolsky-Rosen steering refers to a quantum correlation intermediate between entanglement and Bell nonlocality, which has been connected to another fundamental quantum property: measurement incompatibility. In the finite-dimensional case, efficient computational methods to quantify steerability have been developed. In the infinite-dimensional case, however, less theoretical tools are available. Here, we approach the problem of steerability in the continuous variable case via a notion of state-channel correspondence, which generalizes the well-known Choi-Jamiołkowski correspondence. Via our approach we are able to generalize the connection between steering and incompatibility to the continuous variable case and to connect the steerability of a state with the incompatibility breaking property of a quantum channel, e.g., noisy NOON states and amplitude damping channels. Moreover, we apply our methods to the Gaussian steering setting, proving, among other things, that canonical quadratures are sufficient for steering Gaussian states.
- Maximum likelihood estimation (MLE) is one of the most important methods in machine learning, and the expectation-maximization (EM) algorithm is often used to obtain maximum likelihood estimates. However, EM heavily depends on initial configurations and fails to find the global optimum. On the other hand, in the field of physics, quantum annealing (QA) was proposed as a novel optimization approach. Motivated by QA, we propose a quantum annealing extension of EM, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. We also discuss its advantage in terms of the path integral formulation. Furthermore, by employing numerical simulations, we illustrate how it works in MLE and show that DQAEM outperforms EM.
- Apr 20 2017 quant-ph arXiv:1704.05755v1Coherence, the superposition of orthogonal quantum states, is indispensable in various quantum processes. Inspired by the polynomial invariant for classifying and quantifying entanglement, we first define polynomial coherence measure and systematically investigate its properties. Except for the qubit case, we show that there is no polynomial coherence measure satisfying the criterion that its value takes zero if and only if for incoherent states. Then, we release this strict criterion and obtain a necessary condition for polynomial coherence measure. Furthermore, we give a typical example of polynomial coherence measure for pure states and extend it to mixed states via a convex-roof construction. Analytical formula of our convex-roof polynomial coherence measure is obtained for symmetric states which are invariant under arbitrary basis permutation. Consequently, for general mixed states, we give a lower bound of our coherence measure.
- Apr 20 2017 quant-ph arXiv:1704.05593v2We propose and experimentally demonstrate an efficient framework for the quantum simulation of quantum channels in NMR. Our approach relies on the suitable decomposition of non-unitary operators in a linear combination of $d$ unitary ones, which can be then experimentally implemented with the assistance of a number of ancillary qubits that grows logarithmically in $d$. As a proof-of-principle demonstration, we realize the quantum simulation of three quantum channels for a single-qubit: phase damping (PD), amplitude damping (AD), and depolarizing (DEP) channels. For these paradigmatic cases, we measure key features, such as the fidelity of the initial state and the associated von Neuman entropy for a qubit evolving through these channels. Our experiments are carried out using nuclear spins in a liquid sample and NMR control techniques.
- Apr 20 2017 quant-ph cond-mat.other arXiv:1704.05504v1Silicon-based quantum logic is a promising technology to implement universal quantum computing. It is widely believed that a millikelvin cryogenic environment will be necessary to accommodate silicon-based qubits. This prompts a question of the ultimate scalability of the technology due to finite cooling capacity of refrigeration systems. In this work, we answer this question by studying energy dissipation due to interactions between nuclear spin impurities and qubit control pulses. We demonstrate that this interaction constrains the sustainable number of single-qubit operations per second for a given cooling capacity. Our results indicate that a state-of-the-art dilution refrigerator can, in principle, accommodate operations on millions of qubits before thermal energy dissipation becomes a problem.
- We analyze a dataset providing the complete information on the effective plays of thousands of music listeners during several months. Our analysis confirms a number of properties previously highlighted by research based on interviews and questionnaires, but also uncover new statistical patterns, both at the individual and collective levels. In particular, we show that individuals follow common listening rhythms characterized by the same fluctuations, alternating heavy and light listening periods, and can be classified in four groups of similar sizes according to their temporal habits --- 'early birds', 'working hours listeners', 'evening listeners' and 'night owls'. We provide a detailed radioscopy of the listeners' interplay between repeated listening and discovery of new content. We show that different genres encourage different listening habits, from Classical or Jazz music with a more balanced listening among different songs, to Hip Hop and Dance with a more heterogeneous distribution of plays. Finally, we provide measures of how distant people are from each other in terms of common songs. In particular, we show that the number of songs $S$ a DJ should play to a random audience of size $N$ such that everyone hears at least one song he/she currently listens to, is of the form $S\sim N^\alpha$ where the exponent depends on the music genre and is in the range $[0.5,0.8]$. More generally, our results show that the recent access to virtually infinite catalogs of songs does not promote exploration for novelty, but that most users favor repetition of the same songs.
- Apr 20 2017 quant-ph arXiv:1704.05637v2Entangled quantum states, such as N00N states, are of major importance for quantum technologies due to their quantum-enhanced performance. At the same time, their quantum correlations are relatively vulnerable when they are subjected to imperfections. Therefore, it is crucial to determine under which circumstances their distinct quantum features can be exploited. In this paper, we study the entanglement property of noisy N00N states. This class of states is a generalization of N00N states including various attenuation effects, such as mixing, constant or fluctuating losses, and dephasing. To verify their entanglement, we pursue two strategies: detection-based entanglement witnesses and entanglement quasiprobabilities. Both methods result from our solution of so-called separability eigenvalue equations. In particular, the entanglement quasiprobabilities allow for a full entanglement characterization. As examples of our general treatment, the cases of N00N states subjected to Gaussian dephasing and fluctuating atmospheric losses are explicitly studied. In any correlated fluctuating loss channel, entanglement is found to survive for non-zero transmissivity. In addition, an extension of our approach to multipartite systems is given, and the relation to the quantum-optical nonclassicality in phase-space is discussed.
- This review provides a brief introduction to the physics of coupled exciton-plasmon systems, the theoretical description and experimental manifestation of such phenomena, followed by an account of the state-of-the-art methodology for the numerical simulations of such phenomena and supplemented by a number of FORTRAN codes, by which the interested reader can introduce himself/herself to the practice of such simulations. Applications to CW light scattering as well as transient response and relaxation are described. Particular attention is given to so-called strong coupling limit, where the hybrid exciton-plasmon nature of the system response is strongly expressed. While traditional descriptions of such phenomena usually rely on analysis of the electromagnetic response of inhomogeneous dielectric environments that individually support plasmon and exciton excitations, here we explore also the consequences of a more detailed description of the molecular environment in terms of its quantum density matrix (applied in a mean field approximation level). Such a description makes it possible to account for characteristics that cannot be described by the dielectric response model: the effects of dephasing on the molecular response on one hand, and nonlinear response on the other. It also highlights the still missing important ingredients in the numerical approach, in particular its limitation to a classical description of the radiation field and its reliance on a mean field description of the many-body molecular system. We end our review with an outlook to the near future, where these limitations will be addressed and new novel applications of the numerical approach will be pursued.
- We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements -- 10 times more than any previous dataset -- and many times more instances of non-sarcastic statements, allowing for learning in both balanced and unbalanced label regimes. Each statement is furthermore self-annotated -- sarcasm is labeled by the author, not an independent annotator -- and provided with user, topic, and conversation context. We evaluate the corpus for accuracy, construct benchmarks for sarcasm detection, and evaluate baseline methods.
- Apr 20 2017 cs.AI arXiv:1704.05539v1We introduce the first deep reinforcement learning agent that learns to beat Atari games with the aid of natural language instructions. The agent uses a multimodal embedding between environment observations and natural language to self-monitor progress through a list of English instructions, granting itself reward for completing instructions in addition to increasing the game score. Our agent significantly outperforms Deep Q-Networks (DQNs), Asynchronous Advantage Actor-Critic (A3C) agents, and the best agents posted to OpenAI Gym on what is often considered the hardest Atari 2600 environment: Montezuma's Revenge.
- Predicting personality is essential for social applications supporting human-centered activities, yet prior modeling methods with users written text require too much input data to be realistically used in the context of social media. In this work, we aim to drastically reduce the data requirement for personality modeling and develop a model that is applicable to most users on Twitter. Our model integrates Word Embedding features with Gaussian Processes regression. Based on the evaluation of over 1.3K users on Twitter, we find that our model achieves comparable or better accuracy than state of the art techniques with 8 times fewer data.
- We study work extraction (defined as the difference between the initial and the final energy) in noninteracting and (effectively) weakly interacting isolated fermionic quantum lattice systems in one dimension, which undergo a sequence of quenches and equilibration. The systems are divided in two parts, which we identify as the subsystem of interest and the bath. We extract work by quenching the on-site potentials in the subsystem, letting the entire system equilibrate, and returning to the initial parameters in the subsystem using a quasi-static process (the bath is never acted upon). We select initial states that are direct products of thermal states of the subsystem and the bath, and consider equilibration to the generalized Gibbs ensemble (GGE, noninteracting case) and to the Gibbs ensemble (GE, weakly interacting case). We identify the class of quenches that, in the thermodynamic limit, results in GE and GGE entropies after the quench that are identical to the one in the initial state (quenches that do not produce entropy). Those quenches guarantee maximal work extraction when thermalization occurs. We show that the same remains true in the presence of integrable dynamics that results in equilibration to the GGE.
- In this paper, we will analyze the connection between the fidelity susceptibility, the holographic complexity and the thermodynamic volume. We will regularize the fidelity susceptibility and the holographic complexity by subtracting the contribution of the background AdS spacetime from the deformation of the AdS spacetime. It will be demonstrated that this regularized fidelity susceptibility has the same behavior as the thermodynamic volume and that the regularized complexity has a very different behavior. As the information dual to different volumes in the bulk would be measured by the fidelity susceptibility and the holographic complexity, this paper will establish a connection between thermodynamics and information dual to a volume.
- We evaluate entropy production in a photovoltaic cell that is modeled by four electronic levels resonantly coupled to thermally populated field modes at different temperatures. We use a formalism recently proposed, the so-called multiple parallel worlds, to consistently address the nonlinearity of entropy in terms of density matrix. Our result shows that entropy production is the difference between two flows: a semiclassical flow that linearly depends on occupational probabilities, and another flow that depends nonlinearly on quantum coherence and has no semiclassical analog. We show that entropy production in the cells depends on environmentally induced decoherence time and energy detuning. We characterize regimes where reversal flow of information takes place from a cold to hot bath. Interestingly, we identify a lower bound on entropy production, which sets limitations on the statistics of dissipated heat in the cells.
- Apr 20 2017 cs.CV arXiv:1704.05838v1In this paper, we propose an effective face completion algorithm using a deep generative model. Different from well-studied background completion, the face completion task is more challenging as it often requires to generate semantically new pixels for the missing key components (e.g., eyes and mouths) that contain large appearance variations. Unlike existing nonparametric algorithms that search for patches to synthesize, our algorithm directly generates contents for missing regions based on a neural network. The model is trained with a combination of a reconstruction loss, two adversarial losses and a semantic parsing loss, which ensures pixel faithfulness and local-global contents consistency. With extensive experimental results, we demonstrate qualitatively and quantitatively that our model is able to deal with a large area of missing pixels in arbitrary shapes and generate realistic face completion results.
- We present a novel mapping framework for robot navigation which features a multi-level querying system capable to obtain rapidly representations as diverse as a 3D voxel grid, a 2.5D height map and a 2D occupancy grid. These are inherently embedded into a memory and time efficient core data structure organized as a Tree of SkipLists. Compared to the well-known Octree representation, our approach exhibits a better time efficiency, thanks to its simple and highly parallelizable computational structure, and a similar memory footprint when mapping large workspaces. Peculiarly within the realm of mapping for robot navigation, our framework supports realtime erosion and re-integration of measurements upon reception of optimized poses from the sensor tracker, so as to improve continuously the accuracy of the map.
- Apr 20 2017 cs.CV arXiv:1704.05831v5We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions. Long-term video prediction is difficult to perform by recurrently observing the predicted frames because the small errors in pixel space exponentially amplify as predictions are made deeper into the future. Our approach prevents pixel-level error propagation from happening by removing the need to observe the predicted frames. Our model is built with a combination of LSTM and analogy based encoder-decoder convolutional neural networks, which independently predict the video structure and generate the future frames, respectively. In experiments, our model is evaluated on the Human3.6M and Penn Action datasets on the task of long-term pixel-level video prediction of humans performing actions and demonstrate significantly better results than the state-of-the-art.
- Apr 20 2017 q-bio.NC arXiv:1704.05826v1Community detection algorithms have been widely used to study the organization of complex systems like the brain. A principal appeal of these techniques is their ability to identify a partition of brain regions (or nodes) into communities, where nodes within a community are densely interconnected. In their simplest application, community detection algorithms are agnostic to the presence of community hierarchies, but a common characteristic of many neural systems is a nested hierarchy. To address this limitation, we exercise a multi-scale extension of a community detection technique known as modularity maximization, and we apply the tool to both synthetic graphs and graphs derived from human structural and functional imaging data. Our multi-scale community detection algorithm links a graph to copies of itself across neighboring topological scales, thereby becoming sensitive to conserved community organization across neighboring levels of the hierarchy. We demonstrate that this method allows for a better characterization of topological inhomogeneities of the graph's hierarchy by providing a local (node) measure of community stability and inter-scale reliability across topological scales. We compare the brain's structural and functional network architectures and demonstrate that structural graphs display a wider range of topological scales than functional graphs. Finally, we build a multimodal multiplex graph that combines structural and functional connectivity in a single model, and we identify the topological scales where resting state functional connectivity and underlying structural connectivity show similar versus unique hierarchical community architecture. Together, our results showcase the advantages of the multi-scale community detection algorithm in studying hierarchical community structure in brain graphs, and they illustrate its utility in modeling multimodal neuroimaging data.
- Apr 20 2017 stat.ML arXiv:1704.05820v2We study the problem of interactively learning a binary classifier using noisy labeling and pairwise comparison oracles, where the comparison oracle answers which one in the given two instances is more likely to be positive. Learning from such oracles has multiple applications where obtaining direct labels is harder but pairwise comparisons are easier, and the algorithm can leverage both types of oracles. In this paper, we attempt to characterize how the access to an easier comparison oracle helps in improving the label and total query complexity. We show that the comparison oracle reduces the learning problem to that of learning a threshold function. We then present an algorithm that interactively queries the label and comparison oracles and we characterize its query complexity under Tsybakov and adversarial noise conditions for the comparison and labeling oracles. Our lower bounds show that our label and total query complexity is almost optimal.
- Apr 20 2017 cs.CV arXiv:1704.05817v1It is difficult to recover the motion field from a real-world footage given a mixture of camera shake and other photometric effects. In this paper we propose a hybrid framework by interleaving a Convolutional Neural Network (CNN) and a traditional optical flow energy. We first conduct a CNN architecture using a novel learnable directional filtering layer. Such layer encodes the angle and distance similarity matrix between blur and camera motion, which is able to enhance the blur features of the camera-shake footages. The proposed CNNs are then integrated into an iterative optical flow framework, which enable the capability of modelling and solving both the blind deconvolution and the optical flow estimation problems simultaneously. Our framework is trained end-to-end on a synthetic dataset and yields competitive precision and performance against the state-of-the-art approaches.
- Access to collective excitations lies at the heart of our understanding of quantum many-body systems. We study the Higgs and Goldstone modes in a supersolid quantum gas that is created by coupling a Bose-Einstein condensate symmetrically to two optical cavities. The cavity fields form a U(1)-symmetric order parameter that can be modulated and monitored along both quadratures in real time. This enables us to measure the excitation energies across the superfluid-supersolid phase transition, establish their amplitude and phase nature, as well as characterize their dynamics from an impulse response. Furthermore, we can give a tunable mass to the Goldstone mode at the crossover between continuous and discrete symmetry by changing the coupling of the quantum gas with either cavity.
- We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power.
- Apr 20 2017 cs.DS arXiv:1704.05795v1A sum where each of the $N$ summands can be independently chosen from two choices yields $2^N$ possible summation outcomes. There is an $\mathcal{O}(K^2)$-algorithm that finds the $K$ smallest/largest of these sums by evading the enumeration of all sums.
- Apr 20 2017 stat.ML arXiv:1704.05786v2Variational inference approximates the posterior distribution of a probabilistic model with a parameterized density by maximizing a lower bound for the model evidence. Modern solutions fit a flexible approximation with stochastic gradient descent, using Monte Carlo approximation for the gradients. This enables variational inference for arbitrary differentiable probabilistic models, and consequently makes variational inference feasible for probabilistic programming languages. In this work we develop more efficient inference algorithms for the task by considering importance sampling estimates for the gradients. We show how the gradient with respect to the approximation parameters can often be evaluated efficiently without needing to re-compute gradients of the model itself, and then proceed to derive practical algorithms that use importance sampled estimates to speed up computation.We present importance sampled stochastic gradient descent that outperforms standard stochastic gradient descent by a clear margin for a range of models, and provide a justifiable variant of stochastic average gradients for variational inference.
- Apr 20 2017 cs.CL arXiv:1704.05781v1Distributional semantic models learn vector representations of words through the contexts they occur in. Although the choice of context (which often takes the form of a sliding window) has a direct influence on the resulting embeddings, the exact role of this model component is still not fully understood. This paper presents a systematic analysis of context windows based on a set of four distinct hyper-parameters. We train continuous Skip-Gram models on two English-language corpora for various combinations of these hyper-parameters, and evaluate them on both lexical similarity and analogy tasks. Notable experimental results are the positive impact of cross-sentential contexts and the surprisingly good performance of right-context windows.
- Apr 20 2017 cs.CV arXiv:1704.05776v1Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive when evaluated in benchmarks consider mAP for high IoU thresholds. In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation. We achieved this by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are "deep in context". We evaluated our method in the challenging KITTI dataset which measures methods under IoU threshold of 0.7. We showed that with RRC, a single reduced VGG-16 based model already significantly outperformed all the previously published results. At the time this paper was written our models ranked the first in KITTI car detection (the hard level), the first in cyclist detection and the second in pedestrian detection. These results were not reached by the previous single stage methods. The code is publicly available.
- Apr 20 2017 cs.CV arXiv:1704.05775v2People detection in single 2D images has improved greatly in recent years. However, comparatively little of this progress has percolated into multi-camera multi-people tracking algorithms, whose performance still degrades severely when scenes become very crowded. In this work, we introduce a new architecture that combines Convolutional Neural Nets and Conditional Random Fields to explicitly model those ambiguities. One of its key ingredients are high-order CRF terms that model potential occlusions and give our approach its robustness even when many people are present. Our model is trained end-to-end and we show that it outperforms several state-of-art algorithms on challenging scenes.
- This technical report describes the derivation of the asymptotic eigenvalue distribution for causal 2D-AR models under an upscaling scenario. Specifically, it tackles the analytical derivation of the asymptotic eigenvalue distribution of the sample autocorrelation matrix corresponding to genuine and upscaled images. It also includes the pseudocode of the derived approaches for resampling detection and resampling factor estimation that are based on this analysis.
- Apr 20 2017 math.AT arXiv:1704.05770v1Homotopy type theory is a version of Martin-Löf type theory taking advantage of its homotopical models. In particular, we can use and construct objects of homotopy theory and reason about them using higher inductive types. In this article, we construct the real projective spaces, key players in homotopy theory, as certain higher inductive types in homotopy type theory. The classical definition of RP(n), as the quotient space identifying antipodal points of the n-sphere, does not translate directly to homotopy type theory. Instead, we define RP(n) by induction on n simultaneously with its tautological bundle of 2-element sets. As the base case, we take RP(-1) to be the empty type. In the inductive step, we take RP(n+1) to be the mapping cone of the projection map of the tautological bundle of RP(n), and we use its universal property and the univalence axiom to define the tautological bundle on RP(n+1). By showing that the total space of the tautological bundle of RP(n) is the n-sphere, we retrieve the classical description of RP(n+1) as RP(n) with an (n+1)-cell attached to it. The infinite dimensional real projective space, defined as the sequential colimit of the RP(n) with the canonical inclusion maps, is equivalent to the Eilenberg-MacLane space K(Z/2Z,1), which here arises as the subtype of the universe consisting of 2-element types. Indeed, the infinite dimensional projective space classifies the 0-sphere bundles, which one can think of as synthetic line bundles. These constructions in homotopy type theory further illustrate the utility of homotopy type theory, including the interplay of type theoretic and homotopy theoretic ideas.
- Apr 20 2017 stat.AP arXiv:1704.05767v1This study aims to analyze the methodologies that can be used to estimate the total number of unemployed, as well as the unemployment rates for 28 regions of Portugal, designated as NUTS III regions, using model based approaches as compared to the direct estimation methods currently employed by INE (National Statistical Institute of Portugal). Model based methods, often known as small area estimation methods (Rao, 2003), "borrow strength" from neighbouring regions and in doing so, aim to compensate for the small sample sizes often observed in these areas. Consequently, it is generally accepted that model based methods tend to produce estimates which have lesser variation. Other benefit in employing model based methods is the possibility of including auxiliary information in the form of variables of interest and latent random structures. This study focuses on the application of Bayesian hierarchical models to the Portuguese Labor Force Survey data from the 1st quarter of 2011 to the 4th quarter of 2013. Three different data modeling strategies are considered and compared: Modeling of the total unemployed through Poisson, Binomial and Negative Binomial models; modeling of rates using a Beta model; and modeling of the three states of the labor market (employed, unemployed and inactive) by a Multinomial model. The implementation of these models is based on the \textitIntegrated Nested Laplace Approximation (INLA) approach, except for the Multinomial model which is implemented based on the method of Monte Carlo Markov Chain (MCMC). Finally, a comparison of the performance of these models, as well as the comparison of the results with those obtained by direct estimation methods at NUTS III level are given.
- Vehicular communications have attracted more and more attention recently from both industry and academia due to its strong potential to enhance road safety, improve traffic efficiency, and provide rich on-board information and entertainment services. In this paper, we discuss fundamental physical layer issues that enable efficient vehicular communications and present a comprehensive overview of the state-of-the-art research. We first introduce vehicular channel characteristics and modeling, which are the key underlying features differentiating vehicular communications from other types of wireless systems. We then present schemes to estimate the time-varying vehicular channels and various modulation techniques to deal with high-mobility channels. After reviewing resource allocation for vehicular communications, we discuss the potential to enable vehicular communications over the millimeter wave bands. Finally, we identify the challenges and opportunities associated with vehicular communications.
- Apr 20 2017 cs.CL arXiv:1704.05742v1Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at \urlhttp://nlp.fudan.edu.cn/data/
- Apr 20 2017 cs.CV arXiv:1704.05737v2This paper addresses the task of segmenting moving objects in unconstrained videos. We introduce a novel two-stream neural network with an explicit memory module to achieve this. The two streams of the network encode spatial and temporal features in a video sequence respectively, while the memory module captures the evolution of objects over time. The module to build a "visual memory" in video, i.e., a joint representation of all the video frames, is realized with a convolutional recurrent unit learned from a small number of training video sequences. Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired automatically without any manually-annotated frames. The visual memory is implemented with convolutional gated recurrent units, which allows to propagate spatial information over time. We evaluate our method extensively on two benchmarks, DAVIS and Freiburg-Berkeley motion segmentation datasets, and show state-of-the-art results. For example, our approach outperforms the top method on the DAVIS dataset by nearly 6%. We also provide an extensive ablative analysis to investigate the influence of each component in the proposed framework.
- Apr 20 2017 cs.SI arXiv:1704.05735v2Matrix Factorization (MF) is a very popular method for recommendation systems. It assumes that the underneath rating matrix is low-rank. However, this assumption can be too restrictive to capture complex relationships and interactions among users and items. Recently, Local LOw-Rank Matrix Approximation (LLORMA) has been shown to be very successful in addressing this issue. It just assumes the rating matrix is composed of a number of low-rank submatrices constructed from subsets of similar users and items. Although LLORMA outperforms MF, how to construct such submatrices remains a big problem. Motivated by the availability of rich social connections in today's recommendation systems, we propose a novel framework, i.e., Social LOcal low-rank Matrix Approximation (SLOMA), to address this problem. To the best of our knowledge, SLOMA is the first work to incorporate social connections into the local low-rank framework. Furthermore, we enhance SLOMA by applying social regularization to submatrices factorization, denoted as SLOMA++. Therefore, the proposed model can benefit from both social recommendation and the local low-rank assumption. Experimental results from two real-world datasets, Yelp and Douban, demonstrate the superiority of the proposed models over LLORMA and MF.
- Bias in online information has recently become a pressing issue, with search engines, social networks and recommendation services being accused of exhibiting some form of bias. In this vision paper, we make the case for a systematic approach towards measuring bias. To this end, we discuss formal measures for quantifying the various types of bias, we outline the system components necessary for realizing them, and we highlight the related research challenges and open problems.
- While deep learning is remarkably successful on perceptual tasks, it was also shown to be vulnerable to adversarial perturbations of the input. These perturbations denote noise added to the input that was generated specifically to fool the system while being quasi-imperceptible for humans. More severely, there even exist universal perturbations that are input-agnostic but fool the network on the majority of inputs. While recent work has focused on image classification, this work proposes attacks against semantic image segmentation: we present an approach for generating (universal) adversarial perturbations that make the network yield a desired target segmentation as output. We show empirically that there exist barely perceptible universal noise patterns which result in nearly the same predicted segmentation for arbitrary inputs. Furthermore, we also show the existence of universal noise which removes a target class (e.g., all pedestrians) from the segmentation while leaving the segmentation mostly unchanged otherwise.
- We study the problem of mapping an input image to a tied pair consisting of a vector of parameters and an image that is created using a graphical engine from the vector of parameters. The mapping's objective is to have the output image as similar as possible to the input image. During training, no supervision is given in the form of matching inputs and outputs. This learning problem extends two literature problems: unsupervised domain adaptation and cross domain transfer. We define a generalization bound that is based on discrepancy, and employ a GAN to implement a network solution that corresponds to this bound. Experimentally, our method is shown to solve the problem of automatically creating avatars.
- Apr 20 2017 cs.DS arXiv:1704.05682v1We consider the problem of implementing a space-efficient dynamic trie, with an emphasis on good practical performance. For a trie with $n$ nodes with an alphabet of size $\sigma$, the information-theoretic lower bound is $n \log \sigma + O(n)$ bits. The Bonsai data structure is a compact trie proposed by Darragh et al. (Softw., Pract. Exper. 23(3), 1993, p. 277-291). Its disadvantages include the user having to specify an upper bound $M$ on the trie size in advance (which cannot be changed easily after initalization), a space usage of $M \log \sigma + O(M \log \log M)$ (which is asymptotically non-optimal for smaller $\sigma$ or if $n \ll M$) and a lack of support for deletions. It supports traversal and update operations in $O(1/\epsilon)$ expected time (based on assumptions about the behaviour of hash functions), where $\epsilon = (M-n)/M$ and has excellent speed performance in practice. We propose an alternative, m-Bonsai, that addresses the above problems, obtaining a trie that uses $(1+\beta) n (\log \sigma + O(1))$ bits in expectation, and supports traversal and update operations in $O(1/\beta)$ expected time and $O(1/\beta^2)$ amortized expected time, for any user-specified parameter $\beta > 0$ (again based on assumptions about the behaviour of hash functions). We give an implementation of m-Bonsai which uses considerably less memory and is slightly faster than the original Bonsai.
- Apr 20 2017 cs.FL arXiv:1704.05676v3Automata learning is a technique that has successfully been applied in verification, with the automaton type varying depending on the application domain. Adaptations of automata learning algorithms for increasingly complex types of automata have to be developed from scratch because there was no abstract theory offering guidelines. This makes it hard to devise such algorithms, and it obscures their correctness proofs. We introduce a simple category-theoretic formalism that provides an appropriately abstract foundation for studying automata learning. Furthermore, our framework establishes formal relations between algorithms for learning, testing, and minimization. We illustrate its generality with two examples: deterministic and weighted automata.
- Apr 20 2017 cs.CV arXiv:1704.05674v1We address an essential problem in computer vision, that of unsupervised object segmentation in video, where a main object of interest in a video sequence should be automatically separated from its background. An efficient solution to this task would enable large-scale video interpretation at a high semantic level in the absence of the costly manually labeled ground truth. We propose an efficient unsupervised method for generating foreground object soft-segmentation masks based on automatic selection and learning from highly probable positive features. We show that such features can be selected efficiently by taking into consideration the spatio-temporal, appearance and motion consistency of the object during the whole observed sequence. We also emphasize the role of the contrasting properties between the foreground object and its background. Our model is created in two stages: we start from pixel level analysis, on top of which we add a regression model trained on a descriptor that considers information over groups of pixels and is both discriminative and invariant to many changes that the object undergoes throughout the video. We also present theoretical properties of our unsupervised learning method, that under some mild constraints is guaranteed to learn a correct discriminative classifier even in the unsupervised case. Our method achieves competitive and even state of the art results on the challenging Youtube-Objects and SegTrack datasets, while being at least one order of magnitude faster than the competition. We believe that the competitive performance of our method in practice, along with its theoretical properties, constitute an important step towards solving unsupervised discovery in video.
- Music emotion recognition (MER) is usually regarded as a multi-label tagging task, and each segment of music can inspire specific emotion tags. Most researchers extract acoustic features from music and explore the relations between these features and their corresponding emotion tags. Considering the inconsistency of emotions inspired by the same music segment for human beings, seeking for the key acoustic features that really affect on emotions is really a challenging task. In this paper, we propose a novel MER method by using deep convolutional neural network (CNN) on the music spectrograms that contains both the original time and frequency domain information. By the proposed method, no additional effort on extracting specific features required, which is left to the training procedure of the CNN model. Experiments are conducted on the standard CAL500 and CAL500exp dataset. Results show that, for both datasets, the proposed method outperforms state-of-the-art methods.
- Apr 20 2017 cs.LG arXiv:1704.05646v1Despite being so vital to success of Support Vector Machines, the principle of separating margin maximisation is not used in deep learning. We show that minimisation of margin variance and not maximisation of the margin is more suitable for improving generalisation in deep architectures. We propose the Halfway loss function that minimises the Normalised Margin Variance (NMV) at the output of a deep learning models and evaluate its performance against the Softmax Cross-Entropy loss on the MNIST, smallNORB and CIFAR-10 datasets.
- Apr 20 2017 cs.CV arXiv:1704.05643v1Action recognition from well-segmented 3D skeleton video has been intensively studied. However, due to the difficulty in representing the 3D skeleton video and the lack of training data, action detection from streaming 3D skeleton video still lags far behind its recognition counterpart and image based object detection. In this paper, we propose a novel approach for this problem, which leverages both effective skeleton video encoding and deep regression based object detection from images. Our framework consists of two parts: skeleton-based video image mapping, which encodes a skeleton video to a color image in a temporal preserving way, and an end-to-end trainable fast skeleton action detector (Skeleton Boxes) based on image detection. Experimental results on the latest and largest PKU-MMD benchmark dataset demonstrate that our method outperforms the state-of-the-art methods with a large margin. We believe our idea would inspire and benefit future research in this important area.
- Apr 20 2017 cs.CV arXiv:1704.05629v1Localization of anatomical structures is a prerequisite for many tasks in medical image analysis. We propose a method for automatic localization of one or more anatomical structures in 3D medical images through detection of their presence in 2D image slices using a convolutional neural network (ConvNet). A single ConvNet is trained to detect presence of the anatomical structure of interest in axial, coronal, and sagittal slices extracted from a 3D image. To allow the ConvNet to analyze slices of different sizes, spatial pyramid pooling is applied. After detection, 3D bounding boxes are created by combining the output of the ConvNet in all slices. In the experiments 200 chest CT, 100 cardiac CT angiography (CTA), and 100 abdomen CT scans were used. The heart, ascending aorta, aortic arch, and descending aorta were localized in chest CT scans, the left cardiac ventricle in cardiac CTA scans, and the liver in abdomen CT scans. Localization was evaluated using the distances between automatically and manually defined reference bounding box centroids and walls. The best results were achieved in localization of structures with clearly defined boundaries (e.g. aortic arch) and the worst when the structure boundary was not clearly visible (e.g. liver). The method was more robust and accurate in localization multiple structures.
- Biochemical reaction networks frequently consist of species evolving on multiple timescales. Stochastic simulations of such networks are often computationally challenging and therefore various methods have been developed to obtain sensible stochastic approximations on the timescale of interest. One of the rigorous and popular approaches is the multiscale approximation method for continuous time Markov processes. In this approach, by scaling species abundances and reaction rates, a family of processes parameterized by a scaling parameter is defined. The limiting process of this family is then used to approximate the original process. However, we find that such approximations become inaccurate when combinations of species with disparate abundances either constitute conservation laws or form virtual slow auxiliary species. To obtain more accurate approximation in such cases, we propose here an appropriate modification of the original method.