We propose a method for stably removing known noise from measurements of a quantum state. The problem is cast to a linear inverse problem by using a quantum Fischer information metric as figure of merit. This requires the ability to compute the adjoint of the noise channel with respect to the metric, which can be done analytically when the metric is evaluated at a gaussian (quasi-free) state. This approach can be applied effectively to n-point functions of a quantum field theory. For translation invariant noise, this yields a stable deconvolution method on the first moments of the field which differs from what one would obtain from a purely classical analysis.
In the classical gravitational lensing, a light ray is locally deflected in the direction of $-{\boldsymbol \nabla} \phi$ by the gravitational potential $\phi$. We show that the quantum correction due to the helicity of photons leads to the displacement of the trajectory of light in the direction perpendicular to both ${\boldsymbol \nabla} \phi$ and the classical trajectory, and, in particular, to the splitting of the trajectories of right- and left-handed circularly polarized light. We derive the expression for this gravitational quantum Hall effect (GQHE) in terms of the Berry curvature of photons. We also derive the semiclassical equation of motion for gravitons taking into account the Berry curvature, and find that the GQHE of gravitational waves in curved space is twice as large as that of light.
Aug 11 2017
hep-th arXiv:1708.03040v1
These are some thoughts contained in a letter to colleagues, about the close relation between gravity and quantum mechanics, and also about the possibility of seeing quantum gravity in a lab equipped with quantum computers. I expect this will become feasible sometime in the next decade or two.
Aug 11 2017
cs.CL arXiv:1708.03312v1
The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embedding-based approach is cost-effective for machine learning on Chinese and Japanese.
Aug 11 2017
cs.AI arXiv:1708.03310v1
Knowledge graphs and vector space models are both robust knowledge representation techniques with their individual strengths and weaknesses. Vector space models excel at determining similarity between concepts, but they are severely constrained when evaluating complex dependency relations and other logic based operations that are a forte of knowledge graphs. In this paper, we propose the V-KG structure that helps us unify knowledge graphs and vector representation of entities, and allows us to develop powerful inference methods and search capabilities that combine their complementary strengths. We analogize this to thinking `fast' in vector space along with thinking `deeply' and `slowly' by reasoning over the knowledge graph. We have also created a query processing engine that takes complex queries and decomposes them into subqueries optimized to run on the respective knowledge graph part or the vector part of V-KG. We show that the V-KG structure can process specific queries that are not efficiently handled by vector spaces or knowledge graphs alone. We also demonstrate and evaluate the V-KG structure and the query processing engine by developing a system called Cyber-All-Intel for knowledge extraction, representation and querying in an end-to-end pipeline grounded in the cybersecurity informatics domain.
We present a framework to systematically analyze convolutional neural networks (CNNs) used in classification of cars in autonomous vehicles. Our analysis procedure comprises an image generator that produces synthetic pictures by sampling in a lower dimension image modification subspace and a suite of visualization tools. The image generator produces images which can be used to test the CNN and hence expose its vulnerabilities. The presented framework can be used to extract insights of the CNN classifier, compare across classification models, or generate training and validation datasets.
Aug 11 2017
cs.CV arXiv:1708.03307v1
The ability to automatically detect certain types of cells in microscopy images is of significant interest to a wide range of biomedical research and clinical practices. Cell detection methods have evolved from employing hand-crafted features to deep learning-based techniques to locate target cells. The essential idea of these methods is that their cell classifiers or detectors are trained in the pixel space, where the locations of target cells are labeled. In this paper, we seek a different route and propose a convolutional neural network (CNN)-based cell detection method that uses encoding of the output pixel space. For the cell detection problem, the output space is the sparsely labeled pixel locations indicating cell centers. Consequently, we employ random projections to encode the output space to a compressed vector of fixed dimension. Then, CNN regresses this compressed vector from the input pixels. Using $L_1$-norm optimization, we recover sparse cell locations on the output pixel space from the predicted compressed vector. In the past, output space encoding using compressed sensing (CS) has been used in conjunction with linear and non-linear predictors. To the best of our knowledge, this is the first successful use of CNN with CS-based output space encoding. We experimentally demonstrate that proposed CNN + CS framework (referred to as CNNCS) exceeds the accuracy of the state-of-the-art methods on many benchmark datasets for microscopy cell detection. Additionally, we show that CNNCS can exploit ensemble average by using more than one random encodings of the output space.
We present a machine learning algorithm that takes as input a 2D RGB image and synthesizes a 4D RGBD light field (color and depth of the scene in each ray direction). For training, we introduce the largest public light field dataset, consisting of over 3300 plenoptic camera light fields of scenes containing flowers and plants. Our synthesis pipeline consists of a convolutional neural network (CNN) that estimates scene geometry, a stage that renders a Lambertian light field using that geometry, and a second CNN that predicts occluded rays and non-Lambertian effects. Our algorithm builds on recent view synthesis methods, but is unique in predicting RGBD for each light field ray and improving unsupervised single image depth estimation by enforcing consistency of ray depths that should intersect the same scene point. Please see our supplementary video at https://youtu.be/yLCvWoQLnms
Aug 11 2017
cs.CV arXiv:1708.03280v1
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. However, in order to achieve more precise action boundaries, a temporal localization system should make dense predictions at a fine granularity. A newly proposed work exploits Convolutional-Deconvolutional-Convolutional (CDC) filters to upsample the predictions of 3D ConvNets, making it possible to perform per-frame action predictions and achieving promising performance in terms of temporal action localization. However, CDC network loses temporal information partially due to the temporal downsampling operation. In this paper, we propose an elegant and powerful Temporal Preservation Convolutional (TPC) Network that equips 3D ConvNets with TPC filters. TPC network can fully preserve temporal resolution and downsample the spatial resolution simultaneously, enabling frame-level granularity action localization. TPC network can be trained in an end-to-end manner. Experiment results on public datasets show that TPC network achieves significant improvement on per-frame action prediction and competing results on segment-level temporal action localization.
Aug 11 2017
cs.CV arXiv:1708.03278v1
Dynamic hand gesture recognition has attracted increasing interests because of its importance for human computer interaction. In this paper, we propose a new motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. Finger motion features are extracted to describe finger movements and global motion features are utilized to represent the global movement of hand skeleton. These motion features are then fed into a bidirectional recurrent neural network (RNN) along with the skeleton sequence, which can augment the motion features for RNN and improve the classification performance. Experiments demonstrate that our proposed method is effective and outperforms start-of-the-art methods.
Aug 11 2017
cs.CV arXiv:1708.03276v1
Binarization of degraded historical manuscript images is an important pre-processing step for many document processing tasks. We formulate binarization as a pixel classification learning task and apply a novel Fully Convolutional Network (FCN) architecture that operates at multiple image scales, including full resolution. The FCN is trained to optimize a continuous version of the Pseudo F-measure metric and an ensemble of FCNs outperform the competition winners on 4 of 7 DIBCO competitions. This same binarization technique can also be applied to different domains such as Palm Leaf Manuscripts with good performance. We analyze the performance of the proposed model w.r.t. the architectural hyperparameters, size and diversity of training data, and the input features chosen.
Aug 11 2017
cs.CV arXiv:1708.03275v1
Despite the development of Simultaneous Localization and Mapping (SLAM), there lacks efficient methods for representing and processing their large scale point clouds. In this paper, we propose to simplify the point clouds generated by the semi-dense SLAM using three-dimensional (3D) line segments. Specifically, we present a novel approach for 3D line segments extraction. This approach reduces a 3D line segments fitting problem into two two-dimensional (2D) line segments fitting problems, which takes advantage of both image edge segments and depth maps. We first detect edge segments, which are one-pixel-width pixel chains, from keyframes. We then search 3D line segments of each keyframe along their detected edge pixel chains by minimizing the fitting error on both image plane and depth plane. By incrementally clustering the detected line segments, we show that the resulting 3D representation for the scene achieves a good balance between compactness and completeness.
Aug 11 2017
cs.CV arXiv:1708.03273v1
Convolutional Neural Networks (CNNs) are state-of-the-art models for document image classification tasks. However, many of these approaches rely on parameters and architectures designed for classifying natural images, which differ from document images. We question whether this is appropriate and conduct a large empirical study to find what aspects of CNNs most affect performance on document images. Among other results, we exceed the state-of-the-art on the RVL-CDIP dataset by using shear transform data augmentation and an architecture designed for a larger input image. Additionally, we analyze the learned features and find evidence that CNNs trained on RVL-CDIP learn region-specific layout features.
Aug 11 2017
cs.CL arXiv:1708.03271v1
In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German->English news domain and English->Russian e-commerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.
In recent years supervised representation learning has provided state of the art or close to the state of the art results in semantic analysis tasks including ranking and information retrieval. The core idea is to learn how to embed items into a latent space such that they optimize a supervised objective in that latent space. The dimensions of the latent space have no clear semantics, and this reduces the interpretability of the system. For example, in personalization models, it is hard to explain why a particular item is ranked high for a given user profile. We propose a novel model of representation learning called Supervised Explicit Semantic Analysis (SESA) that is trained in a supervised fashion to embed items to a set of dimensions with explicit semantics. The model learns to compare two objects by representing them in this explicit space, where each dimension corresponds to a concept from a knowledge base. This work extends Explicit Semantic Analysis (ESA) with a supervised model for ranking problems. We apply this model to the task of Job-Profile relevance in LinkedIn in which a set of skills defines our explicit dimensions of the space. Every profile and job are encoded to this set of skills their similarity is calculated in this space. We use RNNs to embed text input into this space. In addition to interpretability, our model makes use of the web-scale collaborative skills data that is provided by users for each LinkedIn profile. Our model provides state of the art result while it remains interpretable.
t-Distributed Stochastic Neighbor Embedding (t-SNE) is one of the most widely used dimensionality reduction methods for data visualization, but it has a perplexity hyperparameter that requires manual selection. In practice, proper tuning of t-SNE perplexity requires users to understand the inner working of the method as well as to have hands-on experience. We propose a model selection objective for t-SNE perplexity that requires negligible extra computation beyond that of the t-SNE itself. We empirically validate that the perplexity settings found by our approach are consistent with preferences elicited from human experts across a number of datasets. The similarities of our approach to Bayesian information criteria (BIC) and minimum description length (MDL) are also analyzed.
This report presents our audio event detection system submitted for Task 2, "Detection of rare sound events", of DCASE 2017 challenge. The proposed system is based on convolutional neural networks (CNNs) and deep neural networks (DNNs) coupled with novel weighted and multi-task loss functions and state-of-the-art phase-aware signal enhancement. The loss functions are tailored for audio event detection in audio streams. The weighted loss is designed to tackle the common issue of imbalanced data in background/foreground classification while the multi-task loss enables the networks to simultaneously model the class distribution and the temporal structures of the target events for recognition. Our proposed systems significantly outperform the challenge baseline, improving F-score from 72.7% to 89.8% and reducing detection error rate from 0.53 to 0.19 on average.
Aug 11 2017
cs.CL arXiv:1708.03186v1
In this paper, we discuss different methods which use meta information and richer context that may accompany source language input to improve machine translation quality. We focus on category information of input text as meta information, but the proposed methods can be extended to all textual and non-textual meta information that might be available for the input text or automatically predicted using the text content. The main novelty of this work is to use state-of-the-art neural network methods to tackle this problem within a statistical machine translation (SMT) framework. We observe translation quality improvements up to 3% in terms of BLEU score in some text categories.
Aug 11 2017
cs.DC arXiv:1708.03157v1
Genetic Programming, a kind of evolutionary computation and machine learning algorithm, is shown to benefit significantly from the application of vectorized data and the TensorFlow numerical computation library on both CPU and GPU architectures. The open source, Python Karoo GP is employed for a series of 190 tests across 6 platforms, with real-world datasets ranging from 18 to 5.5M data points. This body of tests demonstrates that datasets measured in tens and hundreds of data points see 2-15x improvement when moving from the scalar/SymPy configuration to the vector/TensorFlow configuration, with a single core performing on par or better than multiple CPU cores and GPUs. A dataset composed of 90,000 data points demonstrates a single vector/TensorFlow CPU core performing 875x better than 40 scalar/Sympy CPU cores. And a dataset containing 5.5M data points sees GPU configurations out-performing CPU configurations on average by 1.3x.
Aug 11 2017
cs.CL arXiv:1708.03152v1
Neural network-based dialog systems are attracting increasing attention in both academia and industry. Recently, researchers have begun to realize the importance of speaker modeling in neural dialog systems, but there lacks established tasks and datasets. In this paper, we propose speaker classification as a surrogate task for general speaker modeling, and collect massive data to facilitate research in this direction. We further investigate temporal-based and content-based models of speakers, and propose several hybrids of them. Experiments show that speaker classification is feasible, and that hybrid models outperform each single component.
Aug 11 2017
cs.CV arXiv:1708.03132v1
Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images. In contrast to existing methods that often learn a single patch-to-patch mapping from LR to HR images and are regardless of the contextual interdependency between patches, we propose a novel Attention-aware Face Hallucination (Attention-FH) framework which resorts to deep reinforcement learning for sequentially discovering attended patches and then performing the facial part enhancement by fully exploiting the global interdependency of the image. Specifically, in each time step, the recurrent policy network is proposed to dynamically specify a new attended region by incorporating what happened in the past. The state (i.e., face hallucination result for the whole image) can thus be exploited and updated by the local enhancement network on the selected region. The Attention-FH approach jointly learns the recurrent policy network and local enhancement network through maximizing the long-term reward that reflects the hallucination performance over the whole image. Therefore, our proposed Attention-FH is capable of adaptively personalizing an optimal searching path for each face image according to its own characteristic. Extensive experiments show our approach significantly surpasses the state-of-the-arts on in-the-wild faces with large pose and illumination variations.
Aug 11 2017
cs.CV arXiv:1708.03111v1
This paper presents a new approach of transfer learning-based medical image classification to mitigate insufficient labeled data problem in medical domain. Instead of direct transfer learning from source to small number of labeled target data, we propose a modality-bridge transfer learning which employs the bridge database in the same medical imaging acquisition modality as target database. By learning the projection function from source to bridge and from bridge to target, the domain difference between source (e.g., natural images) and target (e.g., X-ray images) can be mitigated. Experimental results show that the proposed method can achieve a high classification performance even for a small number of labeled target medical images, compared to various transfer learning approaches.
Aug 11 2017
cs.CV arXiv:1708.03088v1
In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very little extra computational cost. This module is called NetWarp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network representations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to-end training. Experiments validate that the proposed approach incurs only little extra computational cost, while improving performance, when video streams are available. We achieve new state-of-the-art results on the CamVid and Cityscapes benchmark datasets and show consistent improvements over different baseline networks. Our code and models will be available at http://segmentation.is.tue.mpg.de
Can ideas and techniques from machine learning be leveraged to automatically generate "good" routing configurations? We investigate the power of data-driven routing protocols. Our results suggest that applying ideas and techniques from deep reinforcement learning to this context yields high performance, motivating further research along these lines.
Aug 11 2017
cs.CV arXiv:1708.03070v1
In this paper, we introduce the semantic knowledge of medical images from their diagnostic reports to provide an inspirational network training and an interpretable prediction mechanism with our proposed novel multimodal neural network, namely TandemNet. Inside TandemNet, a language model is used to represent report text, which cooperates with the image model in a tandem scheme. We propose a novel dual-attention model that facilitates high-level interactions between visual and semantic information and effectively distills useful features for prediction. In the testing stage, TandemNet can make accurate image prediction with an optional report text input. It also interprets its prediction by producing attention on the image and text informative feature pieces, and further generating diagnostic report paragraphs. Based on a pathological bladder cancer images and their diagnostic reports (BCIDR) dataset, sufficient experiments demonstrate that our method effectively learns and integrates knowledge from multimodalities and obtains significantly improved performance than comparing baselines.
There is an ongoing search for a physical or operational definition for quantum mechanics. Several informational principles have been proposed which are satisfied by a theory less restrictive than quantum mechanics. Here, we introduce the principle of "many-box locality", which is a refined version of the previously proposed "macroscopic locality". These principles are based on coarse-graining the statistics of several copies of a given box. The set of behaviors satisfying many-box locality for $N$ boxes is denoted $MBL_N$. We study these sets in the bipartite scenario with two binary measurements, in relation with the sets $\mathcal{Q}$ and $\mathcal{Q}_{1+AB}$ of quantum and "almost quantum" correlations. We find that the $MBL_N$ sets are in general not convex. For unbiased marginals, by working in the Fourier space we can prove analytically that $MBL_{N}\subsetneq\mathcal{Q}$ for any finite $N$, while $MBL_{\infty}=\mathcal{Q}$. Then, with suitably developed numerical tools, we find an example of a point that belongs to $MBL_{16}$ but not to $\mathcal{Q}_{1+AB}$. Among the problems that remain open, is whether $\mathcal{Q}\subset MBL_{\infty}$.
Intelligent conversational assistants, such as Apple's Siri, Microsoft's Cortana, and Amazon's Echo, have quickly become a part of our digital life. However, these assistants have major limitations, which prevents users from conversing with them as they would with human dialog partners. This limits our ability to observe how users really want to interact with the underlying system. To address this problem, we developed a crowd-powered conversational assistant, Chorus, and deployed it to see how users and workers would interact together when mediated by the system. Chorus sophisticatedly converses with end users over time by recruiting workers on demand, which in turn decide what might be the best response for each user sentence. Up to the first month of our deployment, 59 users have held conversations with Chorus during 320 conversational sessions. In this paper, we present an account of Chorus' deployment, with a focus on four challenges: (i) identifying when conversations are over, (ii) malicious users and workers, (iii) on-demand recruiting, and (iv) settings in which consensus is not enough. Our observations could assist the deployment of crowd-powered conversation systems and crowd-powered systems in general.
Statistical analysis (SA) is a complex process to deduce population properties from analysis of data. It usually takes a well-trained analyst to successfully perform SA, and it becomes extremely challenging to apply SA to big data applications. We propose to use deep neural networks to automate the SA process. In particular, we propose to construct convolutional neural networks (CNNs) to perform automatic model selection and parameter estimation, two most important SA tasks. We refer to the resulting CNNs as the neural model selector and the neural model estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation study shows that both the selector and estimator demonstrate excellent performances. The idea and proposed framework can be further extended to automate the entire SA process and have the potential to revolutionize how SA is performed in big data analytics.
Aug 11 2017
cs.AI arXiv:1708.03019v1
Hierarchically structured agent plans are important for efficient planning and acting, and they also serve (among other things) to produce "richer" classical plans, composed not just of a sequence of primitive actions, but also "abstract" ones representing the supplied hierarchies. A crucial step for this and other approaches is deriving precondition and effect "summaries" from a given plan hierarchy. This paper provides mechanisms to do this for more pragmatic and conventional hierarchies than in the past. To this end, we formally define the notion of a precondition and an effect for a hierarchical plan; we present data structures and algorithms for automatically deriving this information; and we analyse the properties of the presented algorithms. We conclude the paper by detailing how our algorithms may be used together with a classical planner in order to obtain abstract plans.
Aug 11 2017
cs.DC arXiv:1708.02983v1
The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learning systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \textcolorblackthan their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.
Aug 11 2017
cs.CV arXiv:1708.02982v1
Current fiducial marker detection algorithms rely on marker IDs for false positive rejection. Time is wasted on potential detections that will eventually be rejected as false positives. We introduce ChromaTag, a fiducial marker and detection algorithm designed to use opponent colors to limit and quickly reject initial false detections and grayscale for precise localization. Through experiments, we show that ChromaTag is significantly faster than current fiducial markers while achieving similar or better detection accuracy. We also show how tag size and viewing direction effect detection accuracy. Our contribution is significant because fiducial markers are often used in real-time applications (e.g. marker assisted robot navigation) where heavy computation is required by other parts of the system.
We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.
In this paper, we use variational recurrent neural network to investigate the anomaly detection problem on graph time series. The temporal correlation is modeled by the combination of recurrent neural network (RNN) and variational inference (VI), while the spatial information is captured by the graph convolutional network. In order to incorporate external factors, we use feature extractor to augment the transition of latent variables, which can learn the influence of external factors. With the target function as accumulative ELBO, it is easy to extend this model to on-line method. The experimental study on traffic flow data shows the detection capability of the proposed method.
Aug 11 2017
cs.CV arXiv:1708.02973v1
Visual object tracking is a fundamental and time-critical vision task. Recent years have seen many shallow tracking methods based on real-time pixel-based correlation filters, as well as deep methods that have top performance but need a high-end GPU. In this paper, we learn to improve the speed of deep trackers without losing accuracy. Our fundamental insight is to take an adaptive approach, where easy frames are processed with cheap features (such as pixel values), while challenging frames are processed with invariant but expensive deep features. We formulate the adaptive tracking problem as a decision-making process, and learn an agent to decide whether to locate objects with high confidence on an early layer, or continue processing subsequent layers of a network. This significantly reduces the feed-forward cost for easy frames with distinct or slow-moving objects. We train the agent offline in a reinforcement learning fashion, and further demonstrate that learning all deep layers (so as to provide good features for adaptive tracking) can lead to near real-time average tracking speed of 23 fps on a single CPU while achieving state-of-the-art performance. Perhaps most tellingly, our approach provides a 100X speedup for almost 50% of the time, indicating the power of an adaptive approach.
Cinemagraphs are a compelling way to convey dynamic aspects of a scene. In these media, dynamic and still elements are juxtaposed to create an artistic and narrative experience. Creating a high-quality, aesthetically pleasing cinemagraph requires isolating objects in a semantically meaningful way and then selecting good start times and looping periods for those objects to minimize visual artifacts (such a tearing). To achieve this, we present a new technique that uses object recognition and semantic segmentation as part of an optimization method to automatically create cinemagraphs from videos that are both visually appealing and semantically meaningful. Given a scene with multiple objects, there are many cinemagraphs one could create. Our method evaluates these multiple candidates and presents the best one, as determined by a model trained to predict human preferences in a collaborative way. We demonstrate the effectiveness of our approach with multiple results and a user study.
Aug 11 2017
cs.LG arXiv:1708.02939v1
In this paper we study the convergence of online gradient descent algorithms in reproducing kernel Hilbert spaces (RKHSs) without regularization. We establish a sufficient condition and a necessary condition for the convergence of excess generalization errors in expectation. A sufficient condition for the almost sure convergence is also given. With high probability, we provide explicit convergence rates of the excess generalization errors for both averaged iterates and the last iterate, which in turn also imply convergence rates with probability one. To our best knowledge, this is the first high-probability convergence rate for the last iterate of online gradient descent algorithms without strong convexity. Without any boundedness assumptions on iterates, our results are derived by a novel use of two measures of the algorithm's one-step progress, respectively by generalization errors and by distances in RKHSs, where the variances of the involved martingales are cancelled out by the descent property of the algorithm.
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more stages and more nodes per stage, these weight matrices may be required to be sparse because of memory limitations. The GraphBLAS.org math library standard was developed to provide high performance manipulation of sparse weight matrices and input/output vectors. For sufficiently sparse matrices, a sparse matrix library requires significantly less memory than the corresponding dense matrix implementation. This paper provides a brief description of the mathematics underlying the GraphBLAS. In addition, the equations of a typical DNN are rewritten in a form designed to use the GraphBLAS. An implementation of the DNN is given using a preliminary GraphBLAS C library. The performance of the GraphBLAS implementation is measured relative to a standard dense linear algebra library implementation. For various sizes of DNN weight matrices, it is shown that the GraphBLAS sparse implementation outperforms a BLAS dense implementation as the weight matrix becomes sparser.
We combine transport, magnetization, and torque magnetometry measurements to investigate the electronic structure of ZrTe5 and its evolution with temperature. At fields beyond the quantum limit, we observe a magnetization reversal from paramagnetic to diamagnetic response, which is characteristic of a Dirac semi-metal. We also observe a strong non-linearity in the magnetization that suggests the presence of additional low-lying carriers from other low-energy bands. Finally, we observe a striking sensitivity of the magnetic reversal to temperature that is not readily explained by simple band-structure models, but may be connected to a temperature dependent Lifshitz transition proposed to exist in this material.
Aug 11 2017
math.RA arXiv:1708.03319v1
This paper constructs a Weyl group $W_{\widetilde{\mathfrak{g}}}$ of a very special sandwich algebra $\widetilde{\mathfrak{g}}$ of class $\mathcal{C}$.
We report an observation of coherent phonons of $E_g^1$, $E_u^1$, $E_g^2$ and $A_{1g}^2$ symmetry generated in a single-crystal film of $Bi_2Se_3$ by an intense single-cycle THz pulse. The atomic vibrations reveal themselves through periodic anisotropic modulation of the refractive index of the film. The largest signal is detected at the frequency of 4.05 THz that corresponds to the $E_g^2$ mode. The generation of $E_g^2$ phonons is interpreted as resonant excitation of the Raman mode by the second harmonic of THz-driven nonlinear $E_u^1$ oscillator, the fundamental frequency of which (2.05 THz) is approximately half that of $E_g^2$. The origin of nonlinearity in this case is cubic lattice anharmonicity, while generation of $E_g^1$ (1.1 THz) and $A_{1g}^2$ (5.3 THz) phonons is a manifestation of quartic anharmonicity that provides the frequency combination $2\Omega(E_u^1) =\Omega(A_{1g}^2) - \Omega(E_g^1)$.
The goal of this paper is to introduce and study noncommutative Catalan numbers $C_n$ which belong to the free Laurent polynomial algebra in $n$ generators. Our noncommutative numbers admit interesting (commutative and noncommutative) specializations, one of them related to Garsia-Haiman $(q,t)$-versions, another -- to solving noncommutative quadratic equations. We also establish total positivity of the corresponding (noncommutative) Hankel matrices $H_n$ and introduce accompanying noncommutative binomial coefficients.
Aug 11 2017
nucl-th arXiv:1708.03315v1
A model calculation of deuteron electrodisintegration is used to explore renormalization group (RG) scale dependence for knock-out reactions. When evolved to different resolutions with the similarity RG (SRG), the initial deuteron wave function, the current operator, and the final state interactions (FSI) are all modified, but how they combine depends strongly on the kinematic region. In some regions, for example, the FSI are largely unaffected by evolution, while elsewhere FSI are greatly reduced. For certain kinematics, the impulse approximation at a high RG resolution gives an intuitive picture in terms of a one-body current breaking up a short-range correlated neutron-proton pair, although FSI distort this simple picture. With evolution to low resolution, however, the cross section is unchanged but a very different and arguably simpler intuitive picture emerges, with the evolved current efficiently represented at low momentum through derivative expansions or low-rank singular value decompositions. A consequence is that intuition about physics such as the role of short-range correlations or $D$-state mixing in particular kinematic regimes can be strongly scale dependent.
Aug 11 2017
math.PR arXiv:1708.03313v1
This is an extended version of a series of talks I held at the University of Bochum in 2017 about limit theorems for non-linear functionals of stationary Gaussian random fields. The goal of these talks was to give a fairly detailed introduction to the theory leading to such results, even if some of the results are presented without proof. On the other hand, I gave a simpler proof for some of the results. (The proofs omitted from this text can be found in my Springer Lecture Note Multiple Wiener--Ito Integrals. In this note first I discuss the spectral representation of the covariance function of a Gaussian stationary rendom field by means of the spectral measure and the representation of the elements of the random field by means of a random integral with respect to the random spectral measure. Then I construct the multiple random integrals with respect to the random spectral measure and prove their most important properties. Finally I show some interesting applications of these multiple random integrals. In particular, I prove some non-trivial non-Gaussian limit theorems
Tens of early type galaxies have been recently reported to possess prolate rotation, i.e. significant amount of rotation around the major axis, including two cases in the Local Group. Although expected theoretically, this phenomenon is rarely observed and remains elusive. In order to explore its origin we study the population of well-resolved galaxies in the Illustris cosmological simulation. We identify 59 convincing examples of prolate rotators at the present time, more frequently among more massive galaxies, with the number varying very little with redshift. We follow their evolution back in time using the main progenitor branch galaxies of the Illustris merger trees. We find that the emergence of prolate rotation is strongly correlated with the time of the last significant merger the galaxy experienced, although other evolutionary paths leading to prolate rotation are also possible. The transition to prolate rotation most often happens around the same time as the transition to prolate shape of the stellar component. The mergers leading to prolate rotation have slightly more radial orbits, higher mass ratios, and occur at more recent times than mergers in the reference sample of twin galaxies we construct for comparison. However, they cover a wide range of initial conditions in terms of the mass ratio, merger time, radiality of the progenitor orbits, and the relative orientations of spins of the progenitors with respect to the orbital angular momenta. About half of our sample of prolate rotators were created during gas-rich mergers and the newly formed stars usually support prolate rotation.
Robert Kealhofer, Sooyoung Jang, Sinéad M. Griffin, Caolan John, Katherine A. Benavides, Spencer Doyle, T. Helm, Philip J. W. Moll, Jeffrey B. Neaton, Julia Y. Chan, J. D. Denlinger, James G. Analytis We present the crystal structure, electronic structure, and transport properties of the material YbMnSb$_2$, a candidate system for the investigation of Dirac physics in the presence of magnetic order. Our measurements reveal that this system is a low-carrier-density semimetal with a 2D Fermi surface arising from a Dirac dispersion, consistent with the predictions of density functional theory calculations of the antiferromagnetic system. The low temperature resistivity is very large, suggesting scattering in this system is highly efficient at dissipating momentum despite its Dirac-like nature.
We obtain a duality between certain category of finite MTL-algebras and the category of finite labeled trees. In addition we prove that certain poset products of MTL-algebras are essentialy sheaves of MTL-chains over Alexandrov spaces. Finally we give a concrete description for the studied poset products in terms of direct products and ordinal sums of finite MTL-algebras.
By periodically driving the temperatures of reservoirs in quantum heat engines, geometric phase or Pancharatnam-Berry phase-like (PBp) effects in the thermodynamics can be observed. The PBp can be identified from a generating function (GF) method within an adiabatic quantum Markovian master equation formalism. The GF is shown not to lead to a standard open quantum system's fluctuation theorem in presence of phase-different modulations with an inapplicability in the use of the popular large deviation theory. Effect of coherences on the optimized value of the flux is nullified due to PBp contributions. The PBp causes the universality of the linear coefficient in the expansion of the efficiency at maximum power in terms of Carnot efficiency to be violated.
We explore the Floquet band-structure and electronic transport in laser-illuminated bilayer graphene. By using a bias voltage perpendicular to the graphene bilayer we show how to get one-way charge and valley transport among two unbiased leads. In contrast to quantum pumping, our proposal uses a different mechanism based on generating a non-reciprocal bandstructure with a built-in directionality. The Floquet states at one edge of a graphene layer become hybridized with the continuum on the other layer, and so the resulting bandstructure allows for one-way transport as in an \textitisolator. Our proof-of-concept may serve as a building block for devices exploiting one-way states.
We evaluate the two-photon exchange correction to the elastic electron-proton scattering cross section within a dispersive framework. Besides the elastic contribution, we account for all $\pi N$ intermediate state contributions using the phenomenological MAID fit as an input. We develop a novel method for the analytical continuation of the two-photon exchange amplitudes into the unphysical region and generalize our previous work to the momentum transfer region $ 0.064~\mathrm{GeV}^2 \lesssim Q^2 \lesssim 1~\mathrm{GeV}^2$. We compare our results with recent OLYMPUS, CLAS and VEPP-3 data as well as with empirical fits and estimates in the forward angular region.
Nonlinear PDE's having \bf given conditional symmetries are constructed. They are obtained starting from the invariants of the "conditional symmetry" generator and imposing the extra condition given by the characteristic of the symmetry. Several of examples starting from the Boussinesq and including non-autonomous Korteweg-De Vries like equations are given to showcase the methodology introduced.