- Dec 14 2017 quant-ph arXiv:1712.04914v1Recent progress in building large-scale quantum devices for exploring quantum computing and simulation paradigms has relied upon effective tools for achieving and maintaining good experimental parameters, i.e. tuning up devices. In many cases, including in quantum-dot based architectures, the parameter space grows substantially with the number of qubits, and may become a limit to scalability. Fortunately, machine learning techniques for pattern recognition and image classification using so-called deep neural networks have shown surprising successes for computer-aided understanding of complex systems. In this work, we use deep and convolutional neural networks to characterize states and charge configurations of semiconductor quantum dot arrays when one can only measure a current-voltage characteristic of transport (here conductance) through such a device. For simplicity, we model a semiconductor nanowire connected to leads and capacitively coupled to depletion gates using the Thomas-Fermi approximation and Coulomb blockade physics. We then generate labelled training data for the neural networks, and find at least $90\,\%$ accuracy for charge and state identification for single and double dots purely from the dependence of the nanowire's conductance upon gate voltages. Using these characterization networks, we can then optimize the parameter space to achieve a desired configuration of the array, a technique we call `auto-tuning'. Finally, we show how such techniques can be implemented in an experimental setting by applying our approach to an experimental data set, and outline further problems in this domain, from using charge sensing data to extensions to full one and two-dimensional arrays, that can be tackled with machine learning.
- Variational Bayes (VB) inference is one of the most important algorithms in machine learning and widely used in engineering and industry. However, VB is known to suffer from the problem of local optima. In this Letter, we generalize VB by using quantum mechanics, and propose a new algorithm, which we call quantum annealing variational Bayes (QAVB) inference. We then show that QAVB drastically improve the performance of VB by applying them to a clustering problem described by a Gaussian mixture model. Finally, we discuss an intuitive understanding on how QAVB works well.
- Dec 14 2017 quant-ph arXiv:1712.04669v1Inspired by classical ("actual") Quantum Theory over $\mathbb{C}$ and Modal Quantum Theory (MQT), which is a model of Quantum Theory over certain finite fields, we introduce General Quantum Theory as a Quantum Theory -- in the København interpretation -- over general division rings with involution, in which the inner product "is" a $(\sigma,1)$-Hermitian form $\varphi$. This unites all known such approaches in one and the same theory, and we show that many of the known results such as no-cloning, no-deleting, quantum teleportation and super-dense quantum coding, which are known in classical Quantum Theory over $\mathbb{C}$ and in some MQTs, hold for any General Quantum Theory. On the other hand, in many General Quantum Theories, a geometrical object which we call "quantum kernel" arises, which is invariant under the unitary group $\mathbf{U}(V,\varphi)$, and which carries the geometry of a so-called polar space. We use this object to construct new quantum (teleportation) coding schemes, which mix quantum theory with the geometry of the quantum kernel (and the action of the unitary group). We also show that in characteristic $0$, every General Quantum Theory over an algebraically closed field behaves like classical Quantum Theory over $\mathbb{C}$ at many levels, and that all such theories share one model, which we pin down as the "minimal model," which is countable and defined over $\overline{\mathbb{Q}}$. Moreover, to make the analogy with classical Quantum Theory even more striking, we show that Born's rule holds in any such theory. So all such theories are not modal at all. Finally, we obtain an extension theory for General Quantum Theories in characteristic $0$ which allows one to extend any such theory over algebraically closed fields (such as classical complex Quantum Theory) to larger theories in which a quantum kernel is present.
- Dec 14 2017 quant-ph arXiv:1712.04925v1We test Hardy's paradox of non-locality experimentally on the IBM five-qubit quantum computer for the first time. The quantum circuit is constructed on superconducting qubits corresponding to the original Hardy's test of non-locality. Results confirmed the theory that any non-maximally entangled state of two qubits violates Hardy's Equations, whereas any maximally entangled state and product state of two qubits do not exhibit Hardy's non-locality. We also point out the difficulties associated with the practical implementation of any Hardy's paradox based quantum protocol and propose three performance measures for any two qubits of any quantum computer.
- Dec 14 2017 quant-ph arXiv:1712.04608v1This work aims to understand the monogamy of quantum entanglement from a geometrical point of view. By regarding quantum entanglement as a geometrical structure on the state space of quantum systems and attributing all entanglement related properties as emergent from this geometry of entanglement, we assume there exists a genuine general monogamous relation of quantum entanglement w.r.t. a correspondent genuine entanglement measure Q* which possesses an underlying geometrical origin. We speculate that the monogamous relations w.r.t. an entanglement measure Q can be understood by comparing the different dimension dependencies of the measure Q and Q*. We gave evidences of our conjecture by readdressing two observed properties of the monogamy relations from this geometrical standpoint. Besides the phenomenal explanation of the monogamy of entanglement, we also discussed a fibre bundle structure based candidate solution for the geometry of entanglement and explained how this idea is related to the ER=EPR conjecture and other interesting quantum information processing problems including monogamy of entanglement, entanglement distillation, bound entanglement and activation, and entanglement catalyst.
- Dec 14 2017 astro-ph.CO arXiv:1712.04928v1Cosmology is intrinsically intertwined with questions in fundamental physics. The existence of non-baryonic dark matter requires new physics beyond the Standard Model of elemenatary-particle interactions and Einstein's general relativity, as does the accelerating expansion of the universe. Current tensions between various cosmological measurements may be harbingers of yet more new physics. Progress on understanding dark matter and cosmic acceleration requires long term, high-precision measurements and excellent control of systematics, demanding observational programs that are often outside the discovery/characterization mode that drives many areas of astronomy. We outline potential programs through which the Hubble Space Telescope (HST) could have a major impact on issues in fundamental physics in the coming years. To realize this impact, we suggest the introduction of a "HST Fundamental Physics" observational program that would be subject to a modified proposal and review process.
- Recently there has been a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for representation learning and classification. However, the mathematical reasons for this success remain elusive. This tutorial will review recent work that aims to provide a mathematical justification for several properties of deep networks, such as global optimality, geometric stability, and invariance of the learned representations.
- Natural calamities and disasters disrupt the conventional communication setups and the wireless bandwidth becomes constrained. A safe and cost-effective solution for communication and data access in such scenarios is long needed. Light-Fidelity (Li-Fi) which promises wireless access to data at high speeds using visible light can be a good option. Visible light being safe to use for wireless access in such affected environments also provides illumination. Importantly, when a Li-Fi unit is attached to an air balloon and a network of such Li-Fi balloons are coordinated to form a Li-Fi balloon network, data can be accessed anytime and anywhere required and hence many lives can be tracked and saved. We propose this idea of a Li-Fi balloon and give an overview of its design using the Philips Li-Fi hardware. Further, we propose the concept of a balloon network and coin it with an acronym, the LiBNet. We consider the balloons to be arranged as a homogeneous Poisson point process in the LiBNet and we derive the mean co-channel interference for such an arrangement.
- Dec 14 2017 astro-ph.CO astro-ph.IM arXiv:1712.04512v1This paper introduces Colossus, a public, open-source python package for calculations related to cosmology, the large-scale structure of matter in the universe, and the properties of dark matter halos. The code is designed to be fast and easy to use, with a coherent, well-documented user interface. The cosmology module implements FLRW cosmologies including curvature, relativistic species, and different dark energy equations of state, and provides fast computations of the linear matter power spectrum, variance, and correlation function. The large-scale structure module is concerned with the properties of peaks in Gaussian random fields and halos in a statistical sense, including their peak height, peak curvature, halo bias, and mass function. The halo module deals with spherical overdensity radii and masses, density profiles, concentration, and the splashback radius. To facilitate the rapid exploration of these quantities, Colossus implements about 40 different fitting functions from the literature. I discuss the core routines in detail, with a particular emphasis on their accuracy. Colossus is available at bitbucket.org/bdiemer/colossus.
- A celebrated and controversial hypothesis conjectures that some biological systems --parts, aspects, or groups of them-- may extract important functional benefits from operating at the edge of instability, halfway between order and disorder, i.e. in the vicinity of the critical point of a phase transition. Criticality has been argued to provide biological systems with an optimal balance between robustness against perturbations and flexibility to adapt to changing conditions, as well as to confer on them optimal computational capabilities, huge dynamical repertoires, unparalleled sensitivity to stimuli, etc. Criticality, with its concomitant scale invariance, can be conjectured to emerge in living systems as the result of adaptive and evolutionary processes that, for reasons to be fully elucidated, select for it as a template upon which higher layers of complexity can rest. This hypothesis is very suggestive as it proposes that criticality could constitute a general and common organizing strategy in biology stemming from the physics of phase transitions. However, despite its thrilling implications, this is still in its embryonic state as a well-founded theory and, as such, it has elicited some healthy skepticism. From the experimental side, the advent of high-throughput technologies has created new prospects in the exploration of biological systems, and empirical evidence in favor of criticality has proliferated, with examples ranging from endogenous brain activity and gene-expression patterns, to flocks of birds and insect-colony foraging, to name but a few...
- Radio-frequency (RF) tomographic imaging is a promising technique for inferring multi-dimensional physical space by processing RF signals traversed across a region of interest. However, conventional RF tomography schemes are generally based on vector compressed sensing, which ignores the geometric structures of the target spaces and leads to low recovery precision. The recently proposed transform-based tensor model is more appropriate for sensory data processing, which helps exploit the geometric structures of the three-dimensional target and increases the recovery precision. In this paper, we propose a novel tensor sensing approach that achieves highly accurate estimation for real-world three-dimensional spaces. Firstly, we use the transform-based tensor model to formulate the RF tomographic imaging as a tensor sensing problem, and propose a fast alternating minimization algorithm called Alt-Min. Secondly, a detailed routine to carry out the algorithm is given, moreover, we propose a key optimization which resolves the huge memory consumption and long computation time problems. Finally, we present an evaluation of our Alt-Min approach using IKEA 3D data and demonstrate significant improvement in recovery error and convergence speed compared to prior tensor-based compressed sensing.
- Dec 14 2017 cs.GT arXiv:1712.04918v1The celebrated Gibbard-Satterthwaite Theorem states that any surjective social choice function which is defined over the universal domain of preferences and is strategy-proof must be dictatorial. Aswal, Chatterji and Sen generalize the Gibbard-Satterthwaite theorem by showing that the Gibbard-Satterthwaite theorem still holds if the universal domain is replaced with the linked domain. In this note, we show that determining whether an election is linked can be done in polynomial time.
- The singular values of products of standard complex Gaussian random matrices, or sub-blocks of Haar distributed unitary matrices, have the property that their probability distribution has an explicit, structured form referred to as a polynomial ensemble. It is furthermore the case that the corresponding bi-orthogonal system can be determined in terms of Meijer G-functions, and the correlation kernel given as an explicit double contour integral. It has recently been shown that the Hermitised product $X_M \cdots X_2 X_1A X_1^T X_2^T \cdots X_M^T$, where each $X_i$ is a standard real complex Gaussian matrix, and $A$ is real anti-symmetric shares exhibits analogous properties. Here we use the theory of spherical functions and transforms to present a theory which, for even dimensions, includes these properties of the latter product as a special case. As an example we show that the theory also allows for a treatment of this class of Hermitised product when the $X_i$ are chosen as sub-blocks of Haar distributed real orthogonal matrices.
- We develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities to form an objective function that isolates the heterogeneous treatment effects, and then optimize the learned objective. This approach has several advantages over existing methods. From a practical perspective, our method is very flexible and easy to use: In both steps, we can use any method of our choice, e.g., penalized regression, a deep net, or boosting; moreover, these methods can be fine-tuned by cross-validating on the learned objective. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property, whereby even if our pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same regret bounds as an oracle who has a-priori knowledge of these nuisance components. We implement variants of our method based on both penalized regression and convolutional neural networks, and find promising performance relative to existing baselines.
- Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to thousands of neurons. Embedded platforms are now becoming essential for deep learning deployment due to their portability, versatility, and energy efficiency. The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage. Researchers have investigated on reducing DNN model size with negligible accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, making our approach distinguished from existing approaches. We develop the training and inference algorithms based on FFT as the computing kernel and deploy the FFT-based inference model on embedded platforms achieving extraordinary processing speed.
- We prove characterization theorems for relative entropy (also known as Kullback-Leibler divergence), q-logarithmic entropy (also known as Tsallis entropy), and q-logarithmic relative entropy. All three have been characterized axiomatically before, but we show that earlier proofs can be simplified considerably, at the same time relaxing some of the hypotheses.
- We consider a class of two-dimensional Schrödinger operator with a singular interaction of the $\delta$ type and a fixed strength $\beta$ supported by an infinite family of concentric, equidistantly spaced circles, and discuss what happens below the essential spectrum when the system is amended by an Aharonov-Bohm flux $\alpha\in [0,\frac12]$ in the center. It is shown that if $\beta\ne 0$, there is a critical value $\alpha_\mathrm{crit} \in(0,\frac12)$ such that the discrete spectrum has an accumulation point when $\alpha<\alpha_\mathrm{crit} $, while for $\alpha\ge\alpha_\mathrm{crit} $ the number of eigenvalues is at most finite, in particular, the discrete spectrum is empty for any fixed $\alpha\in (0,\frac12)$ and $|\beta|$ small enough.
- Dec 14 2017 math.FA arXiv:1712.04896v1We study integrals of the form $\int_{\Omega}f\left( d\omega_1 , \ldots , d\omega_m \right), $ where $m \geq 1$ is a given integer, $1 \leq k_{i} \leq n$ are integers and $\omega_{i}$ is a $(k_{i}-1)$-form for all $1 \leq i \leq m$ and $ f:\prod_{i=1}^m \Lambda^{k_i}\left( \mathbb{R}^{n}\right) \rightarrow\mathbb{R}$ is a continuous function. We introduce the appropriate notions of convexity, namely vectorial ext. one convexity, vectorial ext. quasiconvexity and vectorial ext. polyconvexity. We prove weak lower semicontinuity theorems and weak continuity theorems and conclude with applications to minimization problems. These results generalize the corresponding results in both classical vectorial calculus of variations and the calculus of variations for a single differential form.
- Bayesian approximate message passing (BAMP) is an efficient method in compressed sensing that is nearly optimal in the minimum mean squared error (MMSE) sense. Bayesian approximate message passing (BAMP) performs joint recovery of multiple vectors with identical support and accounts for correlations in the signal of interest and in the noise. In this paper, we show how to reduce the complexity of vector BAMP via a simple joint decorrelation diagonalization) transform of the signal and noise vectors, which also facilitates the subsequent performance analysis. We prove that BAMP and the corresponding state evolution (SE) are equivariant with respect to the joint decorrelation transform and preserve diagonality of the residual noise covariance for the Bernoulli-Gauss (BG) prior. We use these results to analyze the dynamics and the mean squared error (MSE) performance of BAMP via the replica method, and thereby understand the impact of signal correlation and number of jointly sparse signals.
- We propose an optimization approach for determining both hardware and software parameters for the efficient implementation of a (family of) applications called dense stencil computations on programmable GPGPUs. We first introduce a simple, analytical model for the silicon area usage of accelerator architectures and a workload characterization of stencil computations. We combine this characterization with a parametric execution time model and formulate a mathematical optimization problem. That problem seeks to maximize a common objective function of 'all the hardware and software parameters'. The solution to this problem, therefore "solves" the codesign problem: simultaneously choosing software-hardware parameters to optimize total performance. We validate this approach by proposing architectural variants of the NVIDIA Maxwell GTX-980 (respectively, Titan X) specifically tuned to a predetermined workload of four common 2D stencils (Heat, Jacobi, Laplacian, and Gradient) and two 3D ones (Heat and Laplacian). Our model predicts that performance would potentially improve by 28% (respectively, 33%) with simple tweaks to the hardware parameters such as adapting coarse and fine-grained parallelism by changing the number of streaming multiprocessors and the number of compute cores each contains. We propose a set of Pareto-optimal design points to exploit the trade-off between performance and silicon area and show that by additionally eliminating GPU caches, we can get a further 2-fold improvement.
- Dec 14 2017 cs.CL arXiv:1712.04853v1The advantages of neural machine translation (NMT) have been extensively validated for offline translation of several language pairs for different domains of spoken and written language. However, research on interactive learning of NMT by adaptation to human post-edits has so far been confined to simulation experiments. We present the first user study on online adaptation of NMT to user post-edits. Our study involves 29 human subjects whose post-editing effort and translation quality were measured on about 4,500 interactions of a human post-editor and a machine translation system integrating an online adaptive learning algorithm. Our experimental results show a significant reduction of human post-editing effort due to online adaptation in NMT according to several evaluation metrics, including hTER, hBLEU, and KSMR. Furthermore, we found significant improvements in BLEU/TER between NMT outputs and human references, and a strong correlation of these improvements with quality improvements of post-edits.
- Dec 14 2017 cs.CV arXiv:1712.04851v1In this paper we study 3D convolutional networks for video understanding tasks. Our starting point is the state-of-the-art I3D model, which "inflates" all the 2D filters of the Inception architecture to 3D. We first consider "deflating" the I3D model at various levels to understand the role of 3D convolutions. Interestingly, we found that 3D convolutions at the top layers of the network contribute more than 3D convolutions at the bottom layers, while also being computationally more efficient. This indicates that I3D is better at capturing high-level temporal patterns than low-level motion signals. We also consider replacing 3D convolutions with spatiotemporal-separable 3D convolutions (i.e., replacing convolution using a k * k * k filter with 1 * k * k followed by k * 1 * 1 filters); we show that such a model, which we call S3D, is 1.5x more computationally efficient (in terms of FLOPS) than I3D, and achieves better accuracy. Finally, we explore spatiotemporal feature gating on top of S3D. The resulting model, which we call S3D-G, outperforms the state-of-the-art I3D model by 3.5% accuracy on Kinetics and reduces the FLOPS by 34%. It also achieves a new state-of-the-art performance when transferred to other action classification (UCF-101 and HMDB-51) and detection (UCF-101 and JHMDB) datasets.
- Dec 14 2017 cs.CV arXiv:1712.04850v1As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, far away mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we train a deep network, using fully automatic supervision, to predict relative scene depth from single images. The depth training images are automatically derived from simple videos of cars moving through a scene, using classic depth from motion techniques, and no human provided labels. We show that this pretext task of predicting depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. In particular, our pre-trained model outperforms an ImageNet counterpart for the monocular depth estimation task. Unlike work that analyzes video paired with additional information about direction of motion, our agent learns from "raw egomotion" video recorded from cars moving through the world. Unlike methods that require videos of moving objects, we neither depend on, nor are disrupted by, moving objects in the video. Indeed, we can benefit from predicting depth in the videos associated with various downstream tasks, showing that we can adapt to new scenes in an unsupervised manner to improve performance. By doing so, we achieve consistently better results over several different urban scene understanding tasks, obtaining results that are competitive with state-of-the-art method for monocular depth estimation.
- Dec 14 2017 cs.CV arXiv:1712.04837v1In this work, we tackle the problem of instance segmentation, the task of simultaneously solving object detection and semantic segmentation. Towards this goal, we present a model, called MaskLab, which produces three outputs: box detection, semantic segmentation, and direction prediction. Building on top of the Faster-RCNN object detector, the predicted boxes provide accurate localization of object instances. Within each region of interest, MaskLab performs foreground/background segmentation by combining semantic and direction prediction. Semantic segmentation assists the model in distinguishing between objects of different semantic classes including background, while the direction prediction, estimating each pixel's direction towards its corresponding center, allows separating instances of the same semantic class. Moreover, we explore the effect of incorporating recent successful methods from both segmentation and detection (i.e. atrous convolution and hypercolumn). Our proposed model is evaluated on the COCO instance segmentation benchmark and shows comparable performance with other state-of-art models.
- Dec 14 2017 cs.CV arXiv:1712.04833v1Symbol detection techniques in online handwritten graphics (e.g. diagrams and mathematical expressions) consist of methods specifically designed for a single graphic type. In this work, we evaluate the Faster R-CNN object detection algorithm as a general method for detection of symbols in handwritten graphics. We evaluate different configurations of the Faster R-CNN method, and point out issues relative to the handwritten nature of the data. Considering the online recognition context, we evaluate efficiency and accuracy trade-offs of using Deep Neural Networks of different complexities as feature extractors. We evaluate the method on publicly available flowchart and mathematical expression (CROHME-2016) datasets. Results show that Faster R-CNN can be effectively used on both datasets, enabling the possibility of developing general methods for symbol detection, and furthermore, general graphic understanding methods that could be built on top of the algorithm.
- Crowdsourcing has become a popular method for collecting labeled training data. However, in many practical scenarios traditional labeling can be difficult for crowdworkers (for example, if the data is high-dimensional or unintuitive, or the labels are continuous). In this work, we develop a novel model for crowdsourcing that can complement standard practices by exploiting people's intuitions about groups and relations between them. We employ a recent machine learning setting, called Ballpark Learning, that can estimate individual labels given only coarse, aggregated signal over groups of data points. To address the important case of continuous labels, we extend the Ballpark setting (which focused on classification) to regression problems. We formulate the problem as a convex optimization problem and propose fast, simple methods with an innate robustness to outliers. We evaluate our methods on real-world datasets, demonstrating how useful constraints about groups can be harnessed from a crowd of non-experts. Our methods can rival supervised models trained on many true labels, and can obtain considerably better results from the crowd than a standard label-collection process (for a lower price). By collecting rough guesses on groups of instances and using machine learning to infer the individual labels, our lightweight framework is able to address core crowdsourcing challenges and train machine learning models in a cost-effective way.
- We develop a phase space formulation of quantum mechanics based on unitary irreducible representations of classical phase spaces. We use a quantum action functional to derive the basic equations. In principle, our formulation is equivalent to the Hilbert space formulation. However, the former allows for consistent truncations to reduced phase spaces in which approximate quantum motions can be derived. We predict that our approach can be very useful in the domain of quantum cosmology and therefore, we use the cosmological phase spaces to establish the basic equations of the new formalism.
- Nowadays, protecting trust in social sciences also means engaging in open community dialogue, which helps to safeguard robustness and improve efficiency of research methods. The combination of open data, open review and open dialogue may sound simple but implementation in the real world will not be straightforward. However, in view of Begley and Ellis's (2012) statement that, "the scientific process demands the highest standards of quality, ethics and rigour," they are worth implementing. More importantly, they are feasible to work on and likely will help to restore plausibility to social sciences research. Therefore, I feel it likely that the triplet of open data, open review and open dialogue will gradually emerge to become policy requirements regardless of the research funding source.
- Dec 14 2017 hep-ph astro-ph.CO arXiv:1712.04793v1One of the most promising strategies to identify the nature of dark matter consists in the search for new particles at accelerators and with so-called direct detection experiments. Working within the framework of simplified models, and making use of machine learning tools to speed up statistical inference, we address the question of what we can learn about dark matter from a detection at the LHC and a forthcoming direct detection experiment. We show that with a combination of accelerator and direct detection data, it is possible to identify newly discovered particles as dark matter, by reconstructing their relic density assuming they are weakly interacting massive particles (WIMPs) thermally produced in the early Universe, and demonstrating that it is consistent with the measured dark matter abundance. An inconsistency between these two quantities would instead point either towards additional physics in the dark sector, or towards a non-standard cosmology, with a thermal history substantially different from that of the standard cosmological model.
- Dec 14 2017 cs.CV arXiv:1712.04778v1The amount of training data that is required to train a classifier scales with the dimensionality of the feature data. In hyperspectral remote sensing, feature data can potentially become very high dimensional. However, the amount of training data is oftentimes limited. Thus, one of the core challenges in hyperspectral remote sensing is how to perform multi-class classification using only relatively few training data points. In this work, we address this issue by enriching the feature matrix with synthetically generated sample points. This synthetic data is sampled from a GMM fitted to each class of the limited training data. Although, the true distribution of features may not be perfectly modeled by the fitted GMM, we demonstrate that a moderate augmentation by these synthetic samples can effectively replace a part of the missing training samples. We show the efficacy of the proposed approach on two hyperspectral datasets. The median gain in classification performance is $5\%$. It is also encouraging that this performance gain is remarkably stable for large variations in the number of added samples, which makes it much easier to apply this method to real-world applications.
- Dec 14 2017 cs.AR arXiv:1712.04771v1With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data explosion. To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain. At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among the typical heterogeneous architectures above, FPGA-based reconfigurable accelerators have two merits as follows: First, FPGA architecture contains a large number of reconfigurable circuits, which satisfy requirements of high performance and low power consumption when specific applications are running. Second, the reconfigurable architectures of employing FPGA performs prototype systems rapidly and features excellent customizability and reconfigurability. Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. To better review the related work of reconfigurable computing accelerators recently, this survey reserves latest high-level research products of reconfigurable accelerator architectures and algorithm applications as the basis. In this survey, we compare hot research issues and concern domains, furthermore, analyze and illuminate advantages, disadvantages, and challenges of reconfigurable accelerators. In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers.
- Dec 14 2017 math.PR arXiv:1712.04770v1We investigate the tail asymptotic behavior of the sojourn time for a large class of centered Gaussian processes $X$, in both continuous- and discrete-time framework. All results obtained here are new for the discrete-time case. In the continuous-time case, we complement the investigations of [1,2] for non-stationary $X$. A by-product of our investigation is a new representation of Pickands constant which is important for Monte-Carlo simulations and yields a sharp lower bound for Pickands constant.
- In this work, we study the key role of generic Effective Field Theory (EFT) framework to quantify the correlation functions in a quasi de Sitter background for an arbitrary initial choice of the quantum vacuum state. We perform the computation in unitary gauge in which we apply St$\ddot{u}$ckelberg trick in lowest dimensional EFT operators which are broken under time diffeomorphism. Particularly using this non-linear realization of broken time diffeomorphism and truncating the action by considering the contribution from two derivative terms in the metric we compute the two point and three point correlations from scalar perturbations and two point correlation from tensor perturbations to quantify the quantum fluctuations observed in Cosmic Microwave Background (CMB) map. We also use equilateral limit and squeezed limit configurations for the scalar three point correlations in Fourier space. To give future predictions from EFT setup and to check the consistency of our derived results for correlations, we use the results obtained from all class of canonical single field and general single field $P(X,\phi)$ model. This analysis helps us to fix the coefficients of the relevant operators in EFT in terms of the slow roll parameters and effective sound speed. Finally, using CMB observation from Planck we constrain all of these coefficients of EFT operators for single field slow roll inflationary paradigm.
- In this article we study the linearized anisotropic Calderon problem. In a compact manifold with boundary, this problem amounts to showing that products of harmonic functions form a complete set. Assuming that the manifold is transversally anisotropic, we show that the boundary measurements determine an FBI type transform at certain points in the transversal manifold. This leads to recovery of transversal singularities in the linearized problem. The method requires a geometric condition on the transversal manifold related to pairs of intersecting geodesics, but it does not involve the geodesic X-ray transform which has limited earlier results on this problem.
- Dec 14 2017 cs.CV arXiv:1712.04711v1Convolution Neural Networks, known as ConvNets exceptionally perform well in many complex machine learning tasks. The architecture of ConvNets demands the huge and rich amount of data and involves with a vast number of parameters that leads the learning takes to be computationally expensive, slow convergence towards the global minima, trap in local minima with poor predictions. In some cases, architecture overfits the data and make the architecture difficult to generalise for new samples that were not in the training set samples. To address these limitations, many regularization and optimization strategies are developed for the past few years. Also, studies suggested that these techniques significantly increase the performance of the networks as well as reducing the computational cost. In implementing these techniques, one must thoroughly understand the theoretical concept of how this technique works in increasing the expressive power of the networks. This article is intended to provide the theoretical concepts and mathematical formulation of the most commonly used strategies in developing a ConvNet architecture.
- For non-critical almost Mathieu operators with Diophantine frequency, we establish exponential asymptotics on the size of spectral gaps, and show that the spectrum is homogeneous. We also prove the homogeneity of the spectrum for Schödinger operators with (measure-theoretically) typical quasi-periodic analytic potentials and fixed strong Diophantine frequency. As applications, we show the discrete version of Deift's conjecture \citeDeift, Deift17 for subcritical analytic quasi-periodic initial data and solve a series of open problems of Damanik-Goldstein et al \citeBDGL, DGL1, dgsv, Go and Kotani \citeKot97.
- Dec 14 2017 cs.CV arXiv:1712.04695v1Recently proposed robust 3D face alignment methods establish either dense or sparse correspondence between a 3D face model and a 2D facial image. The use of these methods presents new challenges as well as opportunities for facial texture analysis. In particular, by sampling the image using the fitted model, a facial UV can be created. Unfortunately, due to self-occlusion, such a UV map is always incomplete. In this paper, we propose a framework for training Deep Convolutional Neural Network (DCNN) to complete the facial UV map extracted from in-the-wild images. To this end, we first gather complete UV maps by fitting a 3D Morphable Model (3DMM) to various multiview image and video datasets, as well as leveraging on a new 3D dataset with over 3,000 identities. Second, we devise a meticulously designed architecture that combines local and global adversarial DCNNs to learn an identity-preserving facial UV completion model. We demonstrate that by attaching the completed UV to the fitted mesh and generating instances of arbitrary poses, we can increase pose variations for training deep face recognition/verification models, and minimise pose discrepancy during testing, which lead to better performance. Experiments on both controlled and in-the-wild UV datasets prove the effectiveness of our adversarial UV completion model. We achieve state-of-the-art verification accuracy, $94.05\%$, under the CFP frontal-profile protocol only by combining pose augmentation during training and pose discrepancy reduction during testing. We will release the first in-the-wild UV dataset (we refer as WildUV) that comprises of complete facial UV maps from 1,892 identities for research purposes.
- Dec 14 2017 hep-th arXiv:1712.04684v1In this thesis, we investigate hidden symmetries for the Maldacena-Wilson loop in N=4 super Yang-Mills theory, mainly focusing on its strong-coupling description as a minimal surface in $AdS_5$. In the discussion of the symmetry structure of the underlying string model, we highlight the role of the master symmetry which can be employed to construct all symmetries of the model. The algebra of these symmetries is worked out. For the concrete case of minimal surfaces in $AdS_5$, we discuss the deformation of the four-cusp solution, which provides the dual description of the four-gluon scattering amplitude. This marks the first step toward transferring the master symmetry to scattering amplitudes. Moreover, we compute the master and Yangian symmetry variations of generic, smooth boundary curves. The discussion clarifies why previous attempts to transfer the deformations of minimal surfaces in $AdS_3$ to weak coupling were unsuccessful. We discuss several attempts to transfer the Yangian symmetry to weak or arbitrary coupling, but ultimately conclude that a Yangian symmetry of the Maldacena-Wilson loop seems not to be present. This is different for the natural supersymmetric generalizations of the Maldacena--Wilson loop, Wilson loops in superspace. Their one-loop expectation value is known to be Yangian invariant. We discuss the strong-coupling counterpart of this finding by considering minimal surfaces in the superspace of type IIB superstring theory in $AdS_5 \times S^5$. The comparison of the strong-coupling invariance derived here with the weak-coupling generators shows that the local term must depend on the coupling in a non-trivial way. Additionally, we show that the higher-level recurrences of the hypercharge generator, the so-called bonus symmetries, are present in all higher levels of the Yangian.
- The main theme of this thesis is the development of computational methods for classes of infinite-dimensional optimization problems arising in optimal control and information theory. The first part of the thesis is concerned with the optimal control of discrete-time continuous space Markov decision processes (MDP). The second part is centred around two fundamental problems in information theory that can be expressed as optimization problems: the channel capacity problem as well as the entropy maximization subject to moment constraints.
- Dec 14 2017 cs.IR arXiv:1712.04671v1The TREC Real-Time Summarization (RTS) track provides a framework for evaluating systems monitoring the Twitter stream and pushing tweets to users according to given profiles. It includes metrics, files, settings and hypothesis provided by the organizers. In this work, we perform a thorough analysis of each component of the framework used in 2016 and 2017 and found some limitations for the Scenario A of this track. Our main findings point out the weakness of the metrics and give clear recommendations to fairly reuse the collection.
- Dec 14 2017 physics.flu-dyn arXiv:1712.04647v1The brachistochrone curve for a non-dissipative particle tries to maximize inertia of the particle but for a fluid filled cylinder, increasing inertia would amount to increased dissipative losses. Hence the trade-off between inertia and dissipation plays a vital role in determining the brachistochrone curve of a fluid filled cylinder. This trade-off manifests itself in the form of an integro-differential equation governing the angular acceleration of the cylinder. Here, we compute the brachistochrone curve of a fluid filled cylinder using optimal control principles and investigate the effect of the aforementioned trade-off on the deviation of the brachistochrone curve from that of a non-dissipative particle. Also, we investigate the effects of the non-dimensional parameters of the problem on the shape of the brachistochrone curve. We then analyze the stability of the time varying fluid flow in the cylinder and find an admissible region for the terminal point which would ensure the stability of the fluid flow as the cylinder rolls along the brachistochrone curve.
- Dec 14 2017 cs.CV arXiv:1712.04646v1Face completion aims to generate semantically new pixels for missing facial components. It is a challenging generative task due to large variations of face appearance. This paper studies generative face completion under structured occlusions. We treat the face completion and corruption as disentangling and fusing processes of clean faces and occlusions, and propose a jointly disentangling and fusing Generative Adversarial Network (DF-GAN). First, three domains are constructed, corresponding to the distributions of occluded faces, clean faces and structured occlusions. The disentangling and fusing processes are formulated as the transformations between the three domains. Then the disentangling and fusing networks are built to learn the transformations from unpaired data, where the encoder-decoder structure is adopted and allows DF-GAN to simulate structure occlusions by modifying the latent representations. Finally, the disentangling and fusing processes are unified into a dual learning framework along with an adversarial strategy. The proposed method is evaluated on Meshface verification problem. Experimental results on four Meshface databases demonstrate the effectiveness of our proposed method for the face completion under structured occlusions.
- Dec 14 2017 math.CA arXiv:1712.04630v1In this paper, we establish a generalized Taylor expansion of a given function $f$ in the form $\displaystyle{f(x) = \sum_{j=0}^m c_j^{\alpha,\rho}\left(x^\rho-a^\rho\right)^{j\alpha} + e_m(x)}$ \noindent with $m\in \mathbb{N}$, $c_j^{\alpha,\rho}\in \mathbb{R}$, $x>a$ and $0< \alpha \leq 1$. In case $\rho = \alpha = 1$, this expression coincides with the classical Taylor formula. The coefficients $c_j^{\alpha,\rho}$, $j=0,\dots,m$ as well as an estimation of $e_m(x)$ are given in terms of the generalized Caputo-type fractional derivatives. Some applications of these results for approximation of functions and for solving some fractional differential equations in series form are given in illustration.
- Dec 14 2017 cs.CV arXiv:1712.04621v1In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping, rotating, and flipping input images. We artificially constrain our access to data to a small subset of the ImageNet dataset, and compare each data augmentation technique in turn. One of the more successful data augmentations strategies is the traditional transformations mentioned above. We also experiment with GANs to generate images of different styles. Finally, we propose a method to allow a neural net to learn augmentations that best improve the classifier, which we call neural augmentation. We discuss the successes and shortcomings of this method on various datasets.
- Dec 14 2017 cond-mat.stat-mech arXiv:1712.04619v1A theoretical analysis of active motion on curved surfaces is presented in terms of a generalization of the Telegrapher's equation. Such generalized equation is explicitly derived as the polar approximation of the hierarchy of equations obtained from the corresponding Fokker-Planck equation of active particles diffusing on curved surfaces. The general solution to the generalized telegrapher's equation is given for a pulse with vanishing current as initial data. Expressions for the probability density and the mean squared geodesic-displacement are given in the limit of weak curvature. As an explicit example of the formulated theory, the case of active motion on the sphere is presented, where oscillations observed in the mean squared geodesic-displacement are explained.
- Dec 14 2017 cs.CV arXiv:1712.04616v1Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise $t$-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets.
- Dec 14 2017 cs.AR arXiv:1712.04614v1This work explores the lesser studied objective of optimizing the multiply-and-accumulates executed during evaluation of the network. In particular, we propose using the Residue Number System (RNS) as the internal number representation across all layer evaluations, allowing us to explore usage of the more power-efficient RNS multipliers and adders. Using results from simulation of our RNS arithmetic block implementations, we show theoretical power advantages of using RNS for an end-to-end evaluator.
- In this paper, we provide a Rapid Orthogonal Approximate Slepian Transform (ROAST) for the discrete vector one obtains when collecting a finite set of uniform samples from a baseband analog signal. The ROAST offers an orthogonal projection which is an approximation to the orthogonal projection onto the leading discrete prolate spheroidal sequence (DPSS) vectors (also known as Slepian basis vectors). As such, the ROAST is guaranteed to accurately and compactly represent not only oversampled bandlimited signals but also the leading DPSS vectors themselves. Moreover, the subspace angle between the ROAST subspace and the corresponding DPSS subspace can be made arbitrarily small. The complexity of computing the representation of a signal using the ROAST is comparable to the FFT, which is much less than the complexity of using the DPSS basis vectors. We also give non-asymptotic results to guarantee that the proposed basis not only provides a very high degree of approximation accuracy in a mean-square error sense for bandlimited sample vectors, but also that it can provide high-quality approximations of all sampled sinusoids within the band of interest.
- Learning customer preferences from an observed behaviour is an important topic in the marketing literature. Structural models typically model forward-looking customers or firms as utility-maximizing agents whose utility is estimated using methods of Stochastic Optimal Control. We suggest an alternative approach to study dynamic consumer demand, based on Inverse Reinforcement Learning (IRL). We develop a version of the Maximum Entropy IRL that leads to a highly tractable model formulation that amounts to low-dimensional convex optimization in the search for optimal model parameters. Using simulations of consumer demand, we show that observational noise for identical customers can be easily confused with an apparent consumer heterogeneity.
- This paper presents a discrete-time option pricing model that is rooted in Reinforcement Learning (RL), and more specifically in the famous Q-Learning method of RL. We construct a risk-adjusted Markov Decision Process for a discrete-time version of the classical Black-Scholes-Merton (BSM) model, where the option price is an optimal Q-function. Pricing is done by learning to dynamically optimize risk-adjusted returns for an option replicating portfolio, as in the Markowitz portfolio theory. Using Q-Learning and related methods, once created in a parametric setting, the model is able to go model-free and learn to price and hedge an option directly from data generated from a dynamic replicating portfolio which is rebalanced at discrete times. If the world is according to BSM, our risk-averse Q-Learner converges, given enough training data, to the true BSM price and hedge ratio of the option in the continuous time limit, even if hedges applied at the stage of data generation are completely random (i.e. it can learn the BSM model itself, too!), because Q-Learning is an off-policy algorithm. If the world is different from a BSM world, the Q-Learner will find it out as well, because Q-Learning is a model-free algorithm. For finite time steps, the Q-Learner is able to efficiently calculate both the optimal hedge and optimal price for the option directly from trading data, and without an explicit model of the world. This suggests that RL may provide efficient data-driven and model-free methods for optimal pricing and hedging of options, once we depart from the academic continuous-time limit, and vice versa, option pricing methods developed in Mathematical Finance may be viewed as special cases of model-based Reinforcement Learning. Our model only needs basic linear algebra (plus Monte Carlo simulation, if we work with synthetic data).