- Feb 23 2017 quant-ph arXiv:1702.06845v1In this work we consider the problem of certifying binary observables based on a Bell inequality violation alone, a task known as self-testing of measurements. We introduce a family of commutation-based measures, which encode all the distinct arrangements of two projective observables on a qubit. These quantities by construction take into account the usual limitations of self-testing and since they are `weighted' by the (reduced) state, they automatically deal with rank-deficient reduced density matrices. We show that these measures can be estimated from the observed Bell violation in several scenarios. The trade-offs turn out to be tight and, in particular, they give non-trivial statements for arbitrarily small violations. On the other extreme, observing the maximal violation allows us to deduce precisely the form of the observables, which immediately leads to a complete rigidity statement. In particular, we show that for all $n \geq 3$ the $n$-partite Mermin-Ardehali-Belinskii-Klyshko inequality self-tests the $n$-partite Greenberger-Horne-Zeilinger state and maximally incompatible qubit measurements on every site. Our results imply that any pair of projective observables on a qubit can be certified in a robust manner. Finally, we show that commutation-based measures give a convenient way of expressing relations between more than two observables.
- The training complexity of deep learning-based channel decoders scales exponentially with the codebook size and therefore with the number of information bits. Thus, neural network decoding (NND) is currently only feasible for very short block lengths. In this work, we show that the conventional iterative decoding algorithm for polar codes can be enhanced when sub-blocks of the decoder are replaced by neural network (NN) based components. Thus, we partition the encoding graph into smaller sub-blocks and train them individually, closely approaching maximum a posteriori (MAP) performance per sub-block. These blocks are then connected via the remaining conventional belief propagation decoding stage(s). The resulting decoding algorithm is non-iterative and inherently enables a high-level of parallelization, while showing a competitive bit error rate (BER) performance. We examine the degradation through partitioning and compare the resulting decoder to state-of-the-art polar decoders such as successive cancellation list and belief propagation decoding.
- The quantum channel between two particle detectors provides a prototype framework for the study of wireless quantum communication via relativistic quantum fields. In this article we calculate the classical channel capacity between two Unruh-DeWitt detectors arising from couplings within the perturbative regime. To this end, we identify the detector states which achieve maximal signal strength. We use these results to investigate the impact of relativistic effects on signaling between detectors in inertial and uniformly accelerated motion which communicate via a massless field in Minkowski spacetime.
- These are expanded lecture notes from lectures given at the Workshop on higher structures at MATRIX Melbourne. These notes give an introduction to Feynman categories and their applications. Feynman categories give a universal categorical way to encode operations and relations. This includes the aspects of operad--like theories such as PROPs, modular operads, twisted (modular) operads, properads, hyperoperads and their colored versions. There is more depth to the general theory as it applies as well to algebras over operads and an abundance of other related structures, such as crossed simplicial groups, the augmented simplicial category or FI--modules. Through decorations and transformations the theory is also related to the geometry of moduli spaces. Furthermore the morphisms in a Feynman category give rise to Hopf-- and bi--algebras with examples coming from topology, number theory and quantum field theory. All these aspects are covered.
- This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics.
- In this paper we study the use of coding techniques to accelerate machine learning (ML). Coding techniques, such as prefix codes, have been extensively studied and used to accelerate low-level data processing primitives such as scans in a relational database system. However, there is little work on how to exploit them to accelerate ML algorithms. In fact, applying coding techniques for faster ML faces a unique challenge: one needs to consider both how the codes fit into the optimization algorithm used to train a model, and the interplay between the model sstructure and the coding scheme. Surprisingly and intriguingly, our study demonstrates that a slight variant of the classical Lempel-Ziv-Welch (LZW) coding scheme is a good fit for several popular ML algorithms, resulting in substantial runtime savings. Comprehensive experiments on several real-world datasets show that our LZW-based ML algorithms exhibit speedups of up to 31x compared to a popular and state-of-the-art ML library, with no changes to ML accuracy, even though the implementations of our LZW variants are not heavily tuned. Thus, our study reveals a new avenue for accelerating ML algorithms using coding techniques and we hope this opens up a new direction for more research.
- Feb 23 2017 physics.chem-ph cond-mat.mes-hall arXiv:1702.06854v1Nano-crystalline diamond is a new carbon phase with numerous intriguing physical and chemical properties and applications. Small doped nanodiamonds for example do find increased use as novel quantum markers in biomedical applications. However, growing doped nanodiamonds below sizes of 5 nm with controlled composition has been elusive so far. Here we grow nanodiamonds under conditions where diamond-like organic seed molecules do not decompose. This is a key first step toward engineered growth of fluorescent nanodiamonds wherein a custom designed seed molecule can be incorporated at the center of a nanodiamond. By substituting atoms at particular locations in the seed molecule it will be possible to achieve complex multi-atom diamond color centers or even to engineer complete nitrogen-vacancy (NV) quantum registers. Other benefits include the potential to grow ultrasmall nanodiamonds, wherein each diamond no matter how small can have at least one bright and photostable fluorescent emitter.
- Given a set of $n$ points $P$ in the plane, the first layer $L_1$ of $P$ is formed by the points that appear on $P$'s convex hull. In general, a point belongs to layer $L_i$, if it lies on the convex hull of the set $P \setminus \bigcup_{j<i}\{L_j\}$. The \emphconvex layers problem is to compute the convex layers $L_i$. Existing algorithms for this problem either do not achieve the optimal $\mathcal{O}\left(n\log n\right)$ runtime and linear space, or are overly complex and difficult to apply in practice. We propose a new algorithm that is both optimal and simple. The simplicity is achieved by independently computing four sets of monotone convex chains in $\mathcal{O}\left(n\log n\right)$ time and linear space. These are then merged in $\mathcal{O}\left(n\log n\right)$ time.
- Feb 23 2017 cs.CR arXiv:1702.06715v1In this paper we present a method which allows attackers to covertly leak data from isolated, air-gapped computers. Our method utilizes the hard disk drive (HDD) activity LED which exists in most of today's desktop PCs, laptops and servers. We show that a malware can indirectly control the HDD LED, turning it on and off rapidly (up to 5800 blinks per second) - a rate that exceeds the visual perception capabilities of humans. Sensitive information can be encoded and leaked over the LED signals, which can then be received remotely by different kinds of cameras and light sensors. Compared to other LED methods, our method is unique, because it is also covert - the HDD activity LED routinely flickers frequently, and therefore the user may not be suspicious to changes in its activity. We discuss attack scenarios and present the necessary technical background regarding the HDD LED and its hardware control. We also present various data modulation methods and describe the implementation of a user-level malware, that doesn't require a kernel component. During the evaluation, we examine the physical characteristics of different colored HDD LEDs (red, blue, and white) and tested different types of receivers: remote cameras, extreme cameras, security cameras, smartphone cameras, drone cameras, and optical sensors. Finally, we discuss hardware and software countermeasures for such a threat. Our experiment shows that sensitive data can be successfully leaked from air-gapped computers via the HDD LED at a maximum bit rate of 4000 bits per second, depending on the type of receiver and its distance from the transmitter. Notably, this speed is 10 times faster than the existing optical covert channels for air-gapped computers. These rates allow fast exfiltration of encryption keys, keystroke logging, and text and binary files.
- Feb 23 2017 cs.LG arXiv:1702.06941v1In this paper, we propose an algebraic formalization of the two important classes of dynamic programming algorithms called forward and forward-backward algorithms. They are generalized extensively in this study so that a wide range of other existing algorithms is subsumed. Forward algorithms generalized in this study subsume the ordinary forward algorithm on trellises for sequence labeling, the inside algorithm on derivation forests for CYK parsing, a unidirectional message passing on acyclic factor graphs, the forward mode of automatic differentiation on computation graphs with addition and multiplication, and so on. In addition, we reveal algebraic structures underlying complicated computation with forward algorithms. By the aid of the revealed algebraic structures, we also propose a systematic framework to design complicated variants of forward algorithms. Forward-backward algorithms generalized in this study subsume the ordinary forward-backward algorithm on trellises for sequence labeling, the inside-outside algorithm on derivation forests for CYK parsing, the sum-product algorithm on acyclic factor graphs, the reverse mode of automatic differentiation (a.k.a. back propagation) on computation graphs with addition and multiplication, and so on. We also propose an algebraic characterization of what can be computed by forward-backward algorithms and elucidate the relationship between forward and forward-backward algorithms.
- We propose the existence of a new universality in classical chaotic systems when the number of degrees of freedom is large: the statistical property of the Lyapunov spectrum is described by Random Matrix Theory. We demonstrate it by studying the finite-time Lyapunov exponents of the matrix model of a stringy black hole and the mass deformed models. The massless limit, which has a dual string theory interpretation, is special in that the universal behavior can be seen already at t=0, while in other cases it sets in at late time. The same pattern is demonstrated also in the product of random matrices.
- Feb 23 2017 cs.CV arXiv:1702.06925v1Limited annotated data is available for the research of estimating facial expression intensities, which makes the training of deep networks for automated expression assessment very challenging. Fortunately, fine-tuning from a data-extensive pre-trained domain such as face verification can alleviate the problem. In this paper, we propose a transferred network that fine-tunes a state-of-the-art face verification network using expression-intensity labeled data with a regression layer. In this way, the expression regression task can benefit from the rich feature representations trained on a huge amount of data for face verification. The proposed transferred deep regressor is applied in estimating the intensity of facial action units (2017 EmotionNet Challenge) and in particular pain intensity estimation (UNBS-McMaster Shoulder-Pain dataset). It wins the second place in the challenge and achieves the state-of-the-art performance on Shoulder-Pain dataset. Particularly for Shoulder-Pain with the imbalance issue of different pain levels, a new weighted evaluation metric is proposed.
- Computations over the rational numbers often suffer from intermediate coefficient swell. One solution to this problem is to apply the given algorithm modulo a number of primes and then lift the modular results to the rationals. This method is guaranteed to work if we use a sufficiently large set of good primes. In many applications, however, there is no efficient way of excluding bad primes. In this note, we describe a technique for rational reconstruction which will nevertheless return the correct result, provided the number of good primes in the selected set of primes is large enough. We give a number of illustrating examples which are implemented using the computer algebra system Singular and the programming language Julia. We discuss applications of our technique in computational algebraic geometry.
- Feb 23 2017 cs.AI arXiv:1702.06915v1The field of Distributed Constraint Optimization has gained momentum in recent years, thanks to its ability to address various applications related to multi-agent cooperation. Nevertheless, solving Distributed Constraint Optimization Problems (DCOPs) optimally is NP-hard. Therefore, in large-scale, complex applications, incomplete DCOP algorithms are necessary. Current incomplete DCOP algorithms suffer of one or more of the following limitations: they (a) find local minima without providing quality guarantees; (b) provide loose quality assessment; or (c) are unable to benefit from the structure of the problem, such as domain-dependent knowledge and hard constraints. Therefore, capitalizing on strategies from the centralized constraint solving community, we propose a Distributed Large Neighborhood Search (D-LNS) framework to solve DCOPs. The proposed framework (with its novel repair phase) provides guarantees on solution quality, refining upper and lower bounds during the iterative process, and can exploit domain-dependent structures. Our experimental results show that D-LNS outperforms other incomplete DCOP algorithms on both structured and unstructured problem instances.
- Feb 23 2017 math.AG arXiv:1702.06912v1This note is based on the plenary talk given by the second author at MACIS 2015, the Sixth International Conference on Mathematical Aspects of Computer and Information Sciences. Motivated by some of the work done within the Priority Programme SPP 1489 of the German Research Council DFG, we discuss a number of current challenges in the development of Open Source computer algebra systems. The main focus is on algebraic geometry and the system Singular.
- Feb 23 2017 cs.CL arXiv:1702.06891v1We present an unsupervised explainable word embedding technique, called EVE, which is built upon the structure of Wikipedia. The proposed model defines the dimensions of a semantic vector representing a word using human-readable labels, thereby it readily interpretable. Specifically, each vector is constructed using the Wikipedia category graph structure together with the Wikipedia article link structure. To test the effectiveness of the proposed word embedding model, we consider its usefulness in three fundamental tasks: 1) intruder detection - to evaluate its ability to identify a non-coherent vector from a list of coherent vectors, 2) ability to cluster - to evaluate its tendency to group related vectors together while keeping unrelated vectors in separate clusters, and 3) sorting relevant items first - to evaluate its ability to rank vectors (items) relevant to the query in the top order of the result. For each task, we also propose a strategy to generate a task-specific human-interpretable explanation from the model. These demonstrate the overall effectiveness of the explainable embeddings generated by EVE. Finally, we compare EVE with the Word2Vec, FastText, and GloVe embedding techniques across the three tasks, and report improvements over the state-of-the-art.
- Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance. In this paper, we propose a novel method to address this task by training a network to obtain robust and representative features. A key observation is that traditional cross entropy loss only enforces the inter-class variation among samples and ignores to narrow down the similarity within each category. We propose a congenerous cosine loss to enlarge the inter-class distinction as well as alleviate the inner-class variance. Such a design is achieved by minimizing the cosine distance between sample and its cluster centroid in a cooperative way. Our method differs from previous work in person recognition that we do not conduct a second training on the test subset and thus maintain a good generalization ability. The identity of a person is determined by measuring the similarity from several body regions in the reference set. Experimental results show that the proposed approach achieves better classification accuracy against previous state-of-the-arts.
- We are proposing to use an ensemble of diverse specialists, where speciality is defined according to the confusion matrix. Indeed, we observed that for adversarial instances originating from a given class, labeling tend to be done into a small subset of (incorrect) classes. Therefore, we argue that an ensemble of specialists should be better able to identify and reject fooling instances, with a high entropy (i.e., disagreement) over the decisions in the presence of adversaries. Experimental results obtained confirm that interpretation, opening a way to make the system more robust to adversarial examples through a rejection mechanism, rather than trying to classify them properly at any cost.
- We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present three classes of attacks on the VAE and VAE-GAN architectures and demonstrate them against networks trained on MNIST, SVHN and CelebA. Our first attack leverages classification-based adversaries by attaching a classifier to the trained encoder of the target generative model, which can then be used to indirectly manipulate the latent representation. Our second attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our third attack moves beyond relying on classification or the standard loss for the gradient and directly optimizes against differences in source and target latent representations. We also motivate why an attacker might be interested in deploying such techniques against a target generative network.
- Feb 23 2017 astro-ph.IM arXiv:1702.06828v1Astrophysics and Space Science are becoming increasingly characterised by what is now known as "big data", the bottlenecks for progress partly shifting from data acquisition to "data mining". Truth is that the amount and rate of data accumulation in many fields already surpasses the local capabilities for its processing and exploitation, and the efficient conversion of scientific data into knowledge is everywhere a challenge. The result is that, to a large extent, isolated data archives risk being progressively likened to "data graveyards", where the information stored is not reused for scientific work. Responsible and efficient use of these large datasets means democratising access and extracting the most science possible from it, which in turn signifies improving data accessibility and integration. Improving data processing capabilities is another important issue specific to researchers and computer scientists of each field. The project presented here wishes to exploit the enormous potential opened up by information technology at our age to advance a model for a science data center in astronomy which aims to expand data accessibility and integration to the largest possible extent and with the greatest efficiency for scientific and educational use. Greater access to data means more people producing and benefiting from information, whereas larger integration of related data from different origins means a greater research potential and increased scientific impact.The project of the BSDC is preoccupied, primarily, with providing tools and solutions for the Brazilian astronomical community. It nevertheless capitalizes on extensive international experience, and is developed in cooperation with the ASI Science Data Center (ASDC), from the Italian Space Agency, granting it an essential ingredient of internationalisation. The BSDC is Virtual Observatory-compliant.
- The advancement in Autonomous Vehicles (AVs) has created an enormous market for the development of self-driving functionalities,raising the question of how it will transform the traditional vehicle development process. One adventurous proposal is to open the AV platform to third-party developers, so that AV functionalities can be developed in a crowd-sourcing way, which could provide tangible benefits to both automakers and end users. Some pioneering companies in the automotive industry have made the move to open the platform so that developers are allowed to test their code on the road. Such openness, however, brings serious security and safety issues by allowing untrusted code to run on the vehicle. In this paper, we introduce the concept of an Appified AV platform that opens the development framework to third-party developers. To further address the safety challenges, we propose an enhanced appified AV design schema called AVGuard, which focuses primarily on mitigating the threats brought about by untrusted code, leveraging theory in the vehicle evaluation field, and conducting program analysis techniques in the cybersecurity area. Our study provides guidelines and suggested practice for the future design of open AV platforms.
- Recent successes in word embedding and document embedding have motivated researchers to explore similar representations for networks and to use such representations for tasks such as edge prediction, node label prediction, and community detection. Existing methods are largely focused on finding distributed representations for unsigned networks and are unable to discover embeddings that respect polarities inherent in edges. We propose SIGNet, a fast scalable embedding method suitable for signed networks. Our proposed objective function aims to carefully model the social structure implicit in signed networks by reinforcing the principles of social balance theory. Our method builds upon the traditional word2vec family of embedding approaches but we propose a new targeted node sampling strategy to maintain structural balance in higher-order neighborhoods. We demonstrate the superiority of SIGNet over state-of-the-art methods proposed for both signed and unsigned networks on several real world datasets from different domains. In particular, SIGNet offers an approach to generate a richer vocabulary of features of signed networks to support representation and reasoning.
- Feb 23 2017 cs.CV arXiv:1702.06799v1Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effectively selects and combines features and their respective kernels during the training. Our experimental results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in first-person activity recognition problem exhibits improved results in comparison to the state-of-the-art. In addition, these techniques enable the expansion of the framework with new features in an efficient and convenient way.
- Feb 23 2017 cs.CV arXiv:1702.06767v1In this paper, we propose a new simple and learning-free deep learning network named MomentsNet, whose convolution layer, nonlinear processing layer and pooling layer are constructed by Moments kernels, binary hashing and block-wise histogram, respectively. Twelve typical moments (including geometrical moment, Zernike moment, Tchebichef moment, etc.) are used to construct the MomentsNet whose recognition performance for binary image is studied. The results reveal that MomentsNet has better recognition performance than its corresponding moments in almost all cases and ZernikeNet achieves the best recognition performance among MomentsNet constructed by twelve moments. ZernikeNet also shows better recognition performance on binary image database than that of PCANet, which is a learning-based deep learning network.
- Feb 23 2017 cs.LG arXiv:1702.06763v1Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensitive settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepMask. By identifying and removing unnecessary features in a DNN model, DeepMask limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepMask is easy to implement and computationally efficient. Experimental results show that DeepMask can increase the performance of state-of-the-art DNN models against adversarial samples.
- Feb 23 2017 cs.LG arXiv:1702.06762v1The idea of style transfer has largely only been explored in image-based tasks, which we attribute in part to the specific nature of loss functions used for style transfer. We propose a general formulation of style transfer as an extension of generative adversarial networks, by using a discriminator to regularize a generator with an otherwise separate loss function. We apply our approach to the task of learning to play chess in the style of a specific player, and present empirical evidence for the viability of our approach.
- When analyzing the genome, researchers have discovered that proteins bind to DNA based on certain patterns of the DNA sequence known as "motifs". However, it is difficult to manually construct motifs due to their complexity. Recently, externally learned memory models have proven to be effective methods for reasoning over inputs and supporting sets. In this work, we present memory matching networks (MMN) for classifying DNA sequences as protein binding sites. Our model learns a memory bank of encoded motifs, which are dynamic memory modules, and then matches a new test sequence to each of the motifs to classify the sequence as a binding or nonbinding site.
- Feb 23 2017 cs.MM arXiv:1702.06728v1Inspired by the recent advances of image super-resolution using convolutional neural network (CNN), we propose a CNN-based block up-sampling scheme for intra frame coding. A block can be down-sampled before being compressed by normal intra coding, and then up-sampled to its original resolution. Different from previous studies on down/up-sampling based coding, the up-sampling interpolation filters in our scheme have been designed by training CNN instead of hand-crafted. We explore a new CNN structure for up-sampling, which features deconvolution of feature maps, multi-scale fusion, and residue learning, making the network both compact and efficient. We also design different networks for the up-sampling of luma and chroma components, respectively, where the chroma up-sampling CNN utilizes the luma information to boost its performance. In addition, we design a two-stage up-sampling process, the first stage being within the block-by-block coding loop, and the second stage being performed on the entire frame, so as to refine block boundaries. We also empirically study how to set the coding parameters of down-sampled blocks for pursuing the frame-level rate-distortion optimization. Our proposed scheme is implemented into the High Efficiency Video Coding (HEVC) reference software, and a comprehensive set of experiments have been performed to evaluate our methods. Experimental results show that our scheme achieves significant bits saving compared with HEVC anchor especially at low bit rates, leading to on average 5.5% BD-rate on common test sequences and on average 9.0% BD-rate on ultra high definition (UHD) test sequences.
- For each integer $n$ we present an explicit formulation of a compact linear program, with $O(n^3)$ variables and constraints, which determines the satisfiability of any 2SAT formula with $n$ boolean variables by a single linear optimization. This contrasts with the fact that the natural polytope for this problem, formed from the convex hull of all satisfiable formulas and their satisfying assignments, has superpolynomial extension complexity. Our formulation is based on multicommodity flows. We also discuss connections of these results to the stable matching problem.
- Feb 23 2017 cs.CL arXiv:1702.06709v1Fine-grained entity type classification (FETC) is the task of classifying an entity mention to a broad set of types. Distant supervision paradigm is extensively used to generate training data for this task. However, generated training data assigns same set of labels to every mention of an entity without considering its local context. Existing FETC systems have two major drawbacks: assuming training data to be noise free and use of hand crafted features. Our work overcomes both drawbacks. We propose a neural network model that jointly learns entity mentions and their context representation to eliminate use of hand crafted features. Our model treats training data as noisy and uses non-parametric variant of hinge loss function. Experiments show that the proposed model outperforms previous state-of-the-art methods on two publicly available datasets, namely FIGER (GOLD) and BBN with an average relative improvement of 2.69% in micro-F1 score. Knowledge learnt by our model on one dataset can be transferred to other datasets while using same model or other FETC systems. These approaches of transferring knowledge further improve the performance of respective models.
- Visual question answering (VQA) has witnessed great progress since May, 2015 as a classic problem unifying visual and textual data into a system. Many enlightening VQA works explore deep into the image and question encodings and fusing methods, of which attention is the most effective and infusive mechanism. Current attention based methods focus on adequate fusion of visual and textual features, but lack the attention to where people focus to ask questions about the image. Traditional attention based methods attach a single value to the feature at each spatial location, which losses many useful information. To remedy these problems, we propose a general method to perform saliency-like pre-selection on overlapped region features by the interrelation of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication based attention method to capture more competent correlation information between visual and textual features. We conduct experiments on the large-scale COCO-VQA dataset and analyze the effectiveness of our model demonstrated by strong empirical results.
- Feb 23 2017 cs.CV arXiv:1702.06683v1The United States spends more than $1B each year on the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.
- Feb 23 2017 cs.RO arXiv:1702.06680v1In this paper, we investigate the convergence and consistency properties of an Invariant-Extended Kalman Filter (RI-EKF) based Simultaneous Localization and Mapping (SLAM) algorithm. Basic convergence properties of this algorithm are proven. These proofs do not require the restrictive assumption that the Jacobians of the motion and observation models need to be evaluated at the ground truth. It is also shown that the output of RI-EKF is invariant under any stochastic rigid body transformation in contrast to $\mathbb{SO}(3)$ based EKF SLAM algorithm ($\mathbb{SO}(3)$-EKF) that is only invariant under deterministic rigid body transformation. Implications of these invariance properties on the consistency of the estimator are also discussed. Monte Carlo simulation results demonstrate that RI-EKF outperforms $\mathbb{SO}(3)$-EKF, Robocentric-EKF and the "First Estimates Jacobian" EKF, for 3D point feature based SLAM.
- We introduce a method by which a generative model learning the joint distribution between actions and future states can be used to automatically infer a control scheme for any desired reward function, which may be altered on the fly without retraining the model. In this method, the problem of action selection is reduced to one of gradient descent on the latent space of the generative model, with the model itself providing the means of evaluating outcomes and finding the gradient, much like how the reward network in Deep Q-Networks (DQN) provides gradient information for the action generator. Unlike DQN or Actor-Critic, which are conditional models for a specific reward, using a generative model of the full joint distribution permits the reward to be changed on the fly. In addition, the generated futures can be inspected to gain insight in to what the network 'thinks' will happen, and to what went wrong when the outcomes deviate from prediction.
- Colorization of grayscale images has been a hot topic in computer vision. Previous research mainly focuses on producing a colored image to match the original one. However, since many colors share the same gray value, an input grayscale image could be diversely colored while maintaining its reality. In this paper, we design a novel solution for unsupervised diverse colorization. Specifically, we leverage conditional generative adversarial networks to model the distribution of real-world item colors, in which we develop a fully convolutional generator with multi-layer noise to enhance diversity, with multi-layer condition concatenation to maintain reality, and with stride 1 to keep spatial information. With such a novel network architecture, the model yields highly competitive performance on the open LSUN bedroom dataset. The Turing test of 80 humans further indicates our generated color schemes are highly convincible.
- Feb 23 2017 cs.CL arXiv:1702.06672v1Children can use the statistical regularities of their environment to learn word meanings, a mechanism known as cross-situational learning. We take a computational approach to investigate how the information present during each observation in a cross-situational framework can affect the overall acquisition of word meanings. We do so by formulating various in-the-moment learning mechanisms that are sensitive to different statistics of the environment, such as counts and conditional probabilities. Each mechanism introduces a unique source of competition or mutual exclusivity bias to the model; the mechanism that maximally uses the model's knowledge of word meanings performs the best. Moreover, the gap between this mechanism and others is amplified in more challenging learning scenarios, such as learning from few examples.
- Real-time monitoring and responses to emerging public health threats rely on the availability of timely surveillance data. During the early stages of an epidemic, the ready availability of line lists with detailed tabular information about laboratory-confirmed cases can assist epidemiologists in making reliable inferences and forecasts. Such inferences are crucial to understand the epidemiology of a specific disease early enough to stop or control the outbreak. However, construction of such line lists requires considerable human supervision and therefore, difficult to generate in real-time. In this paper, we motivate Guided Deep List, the first tool for building automated line lists (in near real-time) from open source reports of emerging disease outbreaks. Specifically, we focus on deriving epidemiological characteristics of an emerging disease and the affected population from reports of illness. Guided Deep List uses distributed vector representations (ala word2vec) to discover a set of indicators for each line list feature. This discovery of indicators is followed by the use of dependency parsing based techniques for final extraction in tabular form. We evaluate the performance of Guided Deep List against a human annotated line list provided by HealthMap corresponding to MERS outbreaks in Saudi Arabia. We demonstrate that Guided Deep List extracts line list features with increased accuracy compared to a baseline method. We further show how these automatically extracted line list features can be used for making epidemiological inferences, such as inferring demographics and symptoms-to-hospitalization period of affected individuals.
- We propose a method to generate path-entangled $N00N$-state photons from quantum dots (QDs) and coupled nanocavities. In the systems we considered, cavity mode frequencies are tuned close to the biexciton two-photon resonance. Under appropriate conditions, the system can have the target $N00N$ state in the energy eigenstate, as a consequence of destructive quantum interference. The $N00N$ state can be generated by the resonant laser excitation. This method, first introduced for two-photon $N00N$ state ($N=2$), can be extended toward higher $N00N$ state ($N>2$) based on our recipe, which is applied to the case of $N=4$ as an example.
- We provide an algorithm for sampling the space of abstract simplicial complexes on a fixed number of vertices that aims to provide a balanced sampling over non-isomorphic complexes. Although sampling uniformly from geometrically distinct complexes is a difficult task with no known analytic algorithm, our generative and descriptive algorithm is designed with heuristics to help balance the combinatorial multiplicities of the states and more widely sample across the space of inequivalent configurations. We provide a formula for the exact probabilities with which this algorithm will produce a requested labeled state, and compare the algorithm to Kahle's multi-parameter model of exponential random simplicial complexes, demonstrating analytically that our algorithm performs better with respect to worst-case probability bounds on a given complex and providing numerical results illustrating the increased sampling efficiency over distinct classes.
- Feb 23 2017 cs.CL arXiv:1702.06589v1Advances in natural language processing tasks have gained momentum in recent years due to the increasingly popular neural network methods. In this paper, we explore deep learning techniques for answering multi-step reasoning questions that operate on semi-structured tables. Challenges here arise from the level of logical compositionality expressed by questions, as well as the domain openness. Our approach is weakly supervised, trained on question-answer-table triples without requiring intermediate strong supervision. It performs two phases: first, machine understandable logical forms (programs) are generated from natural language questions following the work of [Pasupat and Liang, 2015]. Second, paraphrases of logical forms and questions are embedded in a jointly learned vector space using word and character convolutional neural networks. A neural scoring function is further used to rank and retrieve the most probable logical form (interpretation) of a question. Our best single model achieves 34.8% accuracy on the WikiTableQuestions dataset, while the best ensemble of our models pushes the state-of-the-art score on this task to 38.7%, thus slightly surpassing both the engineered feature scoring baseline, as well as the Neural Programmer model of [Neelakantan et al., 2016].
- Feb 23 2017 cs.LG arXiv:1702.06559v1Recent advances in one-shot learning have produced models that can learn from a handful of labeled examples, for passive classification and regression tasks. This paper combines reinforcement learning with one-shot learning, allowing the model to decide, during classification, which examples are worth labeling. We introduce a classification task in which a stream of images are presented and, on each time step, a decision must be made to either predict a label or pay to receive the correct label. We present a recurrent neural network based action-value function, and demonstrate its ability to learn how and when to request labels. Through the choice of reward function, the model can achieve a higher prediction accuracy than a similar model on a purely supervised task, or trade prediction accuracy for fewer label requests.
- Feb 23 2017 cs.SY arXiv:1702.06557v1Evaluating the effectiveness and benefits of driver assistance systems is essential for improving the system performance. In this paper, we propose an efficient evaluation method for a semi-autonomous lane departure correction system. To achieve this, we apply a bounded Gaussian mixture model to describe drivers' stochastic lane departure behavior learned from naturalistic driving data, which can regenerate departure behaviors to evaluate the lane departure correction system. In the stochastic lane departure model, we conduct a dimension reduction to reduce the computation cost. Finally, to show the advantages of our proposed evaluation approach, we compare steering systems with and without lane departure assistance based on the stochastic lane departure model. The simulation results show that the proposed method can effectively evaluate the lane departure correction system.
- Object recognition is an important problem in computer vision, having diverse applications. In this work, we construct an end-to-end scene recognition pipeline consisting of feature extraction, encoding, pooling and classification. Our approach simultaneously utilize global feature descriptors as well as local feature descriptors from images, to form a hybrid feature descriptor corresponding to each image. We utilize DAISY features associated with key points within images as our local feature descriptor and histogram of oriented gradients (HOG) corresponding to an entire image as a global descriptor. We make use of a bag-of-visual-words encoding and apply Mini- Batch K-Means algorithm to reduce the complexity of our feature encoding scheme. A 2-level pooling procedure is used to combine DAISY and HOG features corresponding to each image. Finally, we experiment with a multi-class SVM classifier with several kernels, in a cross-validation setting, and tabulate our results on the fifteen scene categories dataset. The average accuracy of our model was 76.4% in the case of a 40%-60% random split of images into training and testing datasets respectively. The primary objective of this work is to clearly outline the practical implementation of a basic screne-recognition pipeline having a reasonable accuracy, in python, using open-source libraries. A full implementation of the proposed model is available in our github repository.
- We introduce an approach for the real-time (2Hz) creation of a dense map and alignment of a moving robotic agent within that map by rendering using a Graphics Processing Unit (GPU). This is done by recasting the scan alignment part of the dense mapping process as a rendering task. Alignment errors are computed from rendering the scene, comparing with range data from the sensors, and minimized by an optimizer. The proposed approach takes advantage of the advances in rendering techniques for computer graphics and GPU hardware to accelerate the algorithm. Moreover, it allows one to exploit information not used in classic dense mapping algorithms such as Iterative Closest Point (ICP) by rendering interfaces between the free space, occupied space and the unknown. The proposed approach leverages directly the rendering capabilities of the GPU, in contrast to other GPU-based approaches that deploy the GPU as a general purpose parallel computation platform. We argue that the proposed concept is a general consequence of treating perception problems as inverse problems of rendering. Many perception problems can be recast into a form where much of the computation is replaced by render operations. This is not only efficient since rendering is fast, but also simpler to implement and will naturally benefit from future advancements in GPU speed and rendering techniques. Furthermore, this general concept can go beyond addressing perception problems and can be used for other problem domains such as path planning.
- Feb 23 2017 hep-th arXiv:1702.06944v1
- Feb 23 2017 physics.ins-det arXiv:1702.06942v1
- Feb 23 2017 math.AG arXiv:1702.06938v1