- We propose a new Bayesian Neural Net (BNN) formulation that affords variational inference for which the evidence lower bound (ELBO) is analytically tractable subject to a tight approximation. We achieve this tractability by decomposing ReLU nonlinearities into an identity function and a Kronecker delta function. We demonstrate formally that assigning the outputs of these functions to separate latent variables allows representing the neural network likelihood as the composition of a chain of linear operations. Performing variational inference on this construction enables closed-form computation of the evidence lower bound. It can thus be maximized without requiring Monte Carlo sampling to approximate the problematic expected log-likelihood term. The resultant formulation boils down to stochastic gradient descent, where the gradients are not distorted by any factor besides minibatch selection. This amends a long-standing disadvantage of BNNs relative to deterministic nets. Experiments on four benchmark data sets show that the cleaner gradients provided by our construction yield a steeper learning curve, achieving higher prediction accuracies for a fixed epoch budget.
- May 22 2018 cs.CV arXiv:1805.07653v1Generative models of human identity and appearance have broad applicability to behavioral science and technology, but the exquisite sensitivity of human face perception means that their utility hinges on the alignment of the model's representation to human psychological representations and the photorealism of the generated images. Meeting these requirements is an exacting task, and existing models of human identity and appearance are often unworkably abstract, artificial, uncanny, or biased. Here, we use a variational autoencoder with an autoregressive decoder to learn a face space from a uniquely diverse dataset of portraits that control much of the variation irrelevant to human identity and appearance. Our method generates photorealistic portraits of fictive identities with a smooth, navigable latent space. We validate our model's alignment with human sensitivities by introducing a psychophysical Turing test for images, which humans mostly fail. Lastly, we demonstrate an initial application of our model to the problem of fast search in mental space to obtain detailed "police sketches" in a small number of trials.
- While existing social networking services tend to connect people who know each other, people show a desire to also connect to yet unknown people in physical proximity. Existing research shows that people tend to connect to similar people. Utilizing technology in order to stimulate human interaction between strangers, we consider the scenario of two strangers meeting. On the example of similarity in musical taste, we develop a solution for the problem of similarity estimation in proximity-based mobile social networks. We show that a single exchange of a probabilistic data structure between two devices can closely estimate the similarity of two users - without the need to contact a third-party server.We introduce metrics for fast and space-efficient approximation of the Dice coefficient of two multisets - based on the comparison of two Counting Bloom Filters or two Count-Min Sketches. Our analysis shows that utilizing a single hash function minimizes the error when comparing these probabilistic data structures. The size that should be chosen for the data structure depends on the expected average number of unique input elements. Using real user data, we show that a Counting Bloom Filter with a single hash function and a length of 128 is sufficient to accurately estimate the similarity between two multisets representing the musical tastes of two users. Our approach is generalizable for any other similarity estimation of frequencies represented as multisets.
- Most approaches that model time-series data in human activity recognition based on body-worn sensing (HAR) use a fixed size temporal context to represent different activities. This might, however, not be apt for sets of activities with individ- ually varying durations. We introduce attention models into HAR research as a data driven approach for exploring relevant temporal context. Attention models learn a set of weights over input data, which we leverage to weight the temporal context being considered to model each sensor reading. We construct attention models for HAR by adding attention layers to a state- of-the-art deep learning HAR model (DeepConvLSTM) and evaluate our approach on benchmark datasets achieving sig- nificant increase in performance. Finally, we visualize the learned weights to better understand what constitutes relevant temporal context.
- May 22 2018 cs.CV arXiv:1805.07647v1Modern convolutional neural networks (CNNs) are able to achieve human-level object classification accuracy on specific tasks, and currently outperform competing models in explaining complex human visual representations. However, the categorization problem is posed differently for these networks than for humans: the accuracy of these networks is evaluated by their ability to identify single labels assigned to each image. These labels often cut arbitrarily across natural psychological taxonomies (e.g., dogs are separated into breeds, but never jointly categorized as "dogs"), and bias the resulting representations. By contrast, it is common for children to hear both "dog" and "Dalmatian" to describe the same stimulus, helping to group perceptually disparate objects (e.g., breeds) into a common mental class. In this work, we train CNN classifiers with multiple labels for each image that correspond to different levels of abstraction, and use this framework to reproduce classic patterns that appear in human generalization behavior.
- May 22 2018 cs.CV arXiv:1805.07646v1This paper investigates long-term face tracking of a specific person given his/her face image in a single frame as a query in a video stream. Through taking advantage of pre-trained deep learning models on big data, a novel system is developed for accurate video face tracking in the unconstrained environments depicting various people and objects moving in and out of the frame. In the proposed system, we present a detection-verification-tracking method (dubbed as 'DVT') which accomplishes the long-term face tracking task through the collaboration of face detection, face verification, and (short-term) face tracking. An offline trained detector based on cascaded convolutional neural networks localizes all faces appeared in the frames, and an offline trained face verifier based on deep convolutional neural networks and similarity metric learning decides if any face or which face corresponds to the queried person. An online trained tracker follows the face from frame to frame. When validated on a sitcom episode and a TV show, the DVT method outperforms tracking-learning-detection (TLD) and face-TLD in terms of recall and precision. The proposed system is also tested on many other types of videos and shows very promising results.
- May 22 2018 cs.CV arXiv:1805.07644v1Understanding how people represent categories is a core problem in cognitive science. Decades of research have yielded a variety of formal theories of categories, but validating them with naturalistic stimuli is difficult. The challenge is that human category representations cannot be directly observed and running informative experiments with naturalistic stimuli such as images requires a workable representation of these stimuli. Deep neural networks have recently been successful in solving a range of computer vision tasks and provide a way to compactly represent image features. Here, we introduce a method to estimate the structure of human categories that combines ideas from cognitive science and machine learning, blending human-based algorithms with state-of-the-art deep image generators. We provide qualitative and quantitative results as a proof-of-concept for the method's feasibility. Samples drawn from human distributions rival those from state-of-the-art generative models in quality and outperform alternative methods for estimating the structure of human categories.
- We address the problem of semi-supervised domain adaptation of classification algorithms through deep Q-learning. The core idea is to consider the predictions of a source domain network on target domain data as noisy labels, and learn a policy to sample from this data so as to maximize classification accuracy on a small annotated reward partition of the target domain. Our experiments show that learned sampling policies construct labeled sets that improve accuracies of visual classifiers over baselines.
- May 22 2018 math.CT arXiv:1805.07635v1We continue the study of enriched infinity categories, using a definition equivalent to that of Gepner and Haugseng. In our approach enriched infinity categories are associative monoids in an especially designed monoidal category of enriched quivers. We prove that, in case the monoidal structure in the basic category M comes from direct product, our definition is essentially equivalent to the approach via Segal objects. Furthermore, we compare our notion with the notion of category left-tensored over M, and prove a version of Yoneda lemma in this context.
- May 22 2018 cs.CV arXiv:1805.07632v1Given data, deep generative models, such as variational autoencoders (VAE) and generative adversarial networks (GAN), train a lower dimensional latent representation of the data space. The linear Euclidean geometry of data space pulls back to a nonlinear Riemannian geometry on the latent space. The latent space thus provides a low-dimensional nonlinear representation of data and classical linear statistical techniques are no longer applicable. In this paper we show how statistics of data in their latent space representation can be performed using techniques from the field of nonlinear manifold statistics. Nonlinear manifold statistics provide generalizations of Euclidean statistical notions including means, principal component analysis, and maximum likelihood fits of parametric probability distributions. We develop new techniques for maximum likelihood inference in latent space, and adress the computational complexity of using geometric algorithms with high-dimensional data by training a separate neural network to approximate the Riemannian metric and cometric tensor capturing the shape of the learned data manifold.
- In this paper we consider Multiple-Input-Multiple-Output (MIMO) detection using deep neural networks. We introduce two different deep architectures: a standard fully connected multi-layer network, and a Detection Network (DetNet) which is specifically designed for the task. The structure of DetNet is obtained by unfolding the iterations of a projected gradient descent algorithm into a network. We compare the accuracy and runtime complexity of the purposed approaches and achieve state-of-the-art performance while maintaining low computational requirements. Furthermore, we manage to train a single network to detect over an entire distribution of channels. Finally, we consider detection with soft outputs and show that the networks can easily be modified to produce soft decisions.
- Network pruning is of great importance due to the elimination of the unimportant weights or features activated due to the network over-parametrization. Advantages of sparsity enforcement include preventing the overfitting and speedup. Considering a large number of parameters in deep architectures, network compression becomes of critical importance due to the required huge amount of computational power. In this work, we impose structured sparsity for speaker verification which is the validation of the query speaker compared to the speaker gallery. We will show that the mere sparsity enforcement can improve the verification results due to the possible initial overfitting in the network.
- Local competition among neighboring neurons is a common procedure taking place in biological systems. This finding has inspired research on more biologically plausible deep networks that comprise competing linear units; such models can be effectively trained by means of gradient-based backpropagation. This is in contrast to traditional deep networks, built of nonlinear units that do not entail any form of (local) competition. However, for the case of competition-based networks, the problem of data-driven inference of their most appropriate configuration, including the needed number of connections or locally competing sets of units, has not been touched upon by the research community. This work constitutes the first attempt to address this shortcoming; to this end, we leverage solid arguments from the field of Bayesian nonparametrics. Specifically, we introduce auxiliary discrete latent variables of model component utility, and perform Bayesian inference over them. We impose appropriate stick-breaking priors over the introduced discrete latent variables; these give rise to an well-established sparsity-inducing mechanism. We devise efficient inference algorithms for our model by resorting to stochastic gradient variational Bayes. We perform an extensive experimental evaluation of our approach using benchmark data. Our results verify that we obtain state-of-the-art accuracy albeit via networks of much smaller memory and computational footprint than the competition.
- May 22 2018 stat.ME arXiv:1805.07622v1Accurate diagnosis of disease is of great importance in clinical practice and medical research. The receiver operating characteristic (ROC) surface is a popular tool for evaluating the discriminatory ability of continuous diagnostic test outcomes when there exist three ordered disease classes (e.g., no disease, mild disease, advanced disease). We propose the Bayesian bootstrap, a fully nonparametric method, for conducting inference about the ROC surface and its functionals, such as the volume under the surface. The proposed method is based on a simple, yet interesting, representation of the ROC surface in terms of placement variables. Results from a simulation study demonstrate the ability of our method to successfully recover the true ROC surface and to produce valid inferences in a variety of complex scenarios. An application to data from the Trail Making Test to assess cognitive impairment in Parkinson's disease patients is provided.
- May 22 2018 cs.CV arXiv:1805.07621v1In this paper, we formalize the idea behind capsule nets of using a capsule vector rather than a neuron activation to predict the label of samples. To this end, we propose to learn a group of capsule subspaces onto which an input feature vector is projected. Then the lengths of resultant capsules are used to score the probability of belonging to different classes. We train such a Capsule Projection Network (CapProNet) by learning an orthogonal projection matrix for each capsule subspace, and show that each capsule subspace is updated until it contains input feature vectors corresponding to the associated class. Only a small negligible computing overhead is incurred to train the network in low-dimensional capsule subspaces or through an alternative hyper-power iteration to estimate the normalization matrix. Experiment results on image datasets show the presented model can greatly improve the performance of state-of-the-art ResNet backbones by $10-20\%$ at the same level of computing and memory costs.
- May 22 2018 quant-ph arXiv:1805.07620v1Non-Hermitian singularities are ubiquitous in non-conservative open systems. These singularities are often points of measure zero in the eigenspectrum of the system which make them difficult to access without careful engineering. Despite that, they can remotely induce observable effects when some of the system's parameters are varied along closed trajectories in the parameter space. To date, a general formalism for describing this process beyond simple cases is still lacking. Here, we bridge this gap and develop a general approach for treating this problem by utilizing the power of permutation operators and representation theory. This in turn allows us to reveal the following surprising result which contradicts the common belief in the field: loops that enclose the same singularities starting from the same initial point and traveling in the same direction, do not necessarily share the same end outcome. Interestingly, we find that this equivalence can be formally established only by invoking the topological notion of homotopy. Our findings are general with far reaching implications in various fields ranging from photonics and atomic physics to microwaves and acoustics.
- May 22 2018 hep-ph arXiv:1805.07619v1We study in detail the vacuum structure of a composite two Higgs doublet model based on a minimal underlying theory with 3 Dirac fermions in pseudo-real representations of the condensing gauge interactions, leading to the SU(6)/Sp(6) symmetry breaking pattern. We find that, independently on the source of top mass, the most general CP-conserving vacuum is characterised by three non-vanishing angles. A special case occurs if the Yukawas are aligned, leading to a single angle. In the latter case, a Dark Matter candidate arises, protected by a global U(1) symmetry.
- Feed-forward networks are widely used in cross-modal applications to bridge modalities by mapping distributed vectors of one modality to the other, or to a shared space. The predicted vectors are then used to perform e.g., retrieval or labeling. Thus, the success of the whole system relies on the ability of the mapping to make the neighborhood structure (i.e., the pairwise similarities) of the predicted vectors akin to that of the target vectors. However, whether this is achieved has not been investigated yet. Here, we propose a new similarity measure and two ad hoc experiments to shed light on this issue. In three cross-modal benchmarks we learn a large number of language-to-vision and vision-to-language neural network mappings (up to five layers) using a rich diversity of image and text features and loss functions. Our results reveal that, surprisingly, the neighborhood structure of the predicted vectors consistently resembles more that of the input vectors than that of the target vectors. In a second experiment, we further show that untrained nets do not significantly disrupt the neighborhood (i.e., semantic) structure of the input vectors.
- May 22 2018 cs.CV arXiv:1805.07615v1Bionic design refers to an approach of generative creativity in which a target object (e.g. a floor lamp) is designed to contain features of biological source objects (e.g. flowers), resulting in creative biologically-inspired design. In this work, we attempt to model the process of shape-oriented bionic design as follows: given an input image of a design target object, the model generates images that 1) maintain shape features of the input design target image, 2) contain shape features of images from the specified biological source domain, 3) are plausible and diverse. We propose DesignGAN, a novel unsupervised deep generative approach to realising bionic design. Specifically, we employ a conditional Generative Adversarial Networks architecture with several designated losses (an adversarial loss, a regression loss, a cycle loss and a latent loss) that respectively constrict our model to meet the corresponding aforementioned requirements of bionic design modelling. We perform qualitative and quantitative experiments to evaluate our method, and demonstrate that our proposed approach successfully generates creative images of bionic design.
- The nodal set of a Laplacian eigenfunction forms a partition of the underlying manifold or graph. Another natural partition is based on the gradient vector field of the eigenfunction (on a manifold) or on the extremal points of the eigenfunction (on a graph). The submanifolds (or subgraphs) of this partition are called Neumann domains. This paper reviews the subject, as appears in a few recent works and points out some open questions and conjectures. The paper concerns both manifolds and metric graphs and the exposition allows for a comparison between the results obtained for each of them.
- We present a prescription for using the a central charge to determine the flow of a strongly coupled supersymmetric theory from its weakly coupled dual. The approach is based on the equivalence of the scale-dependent a-parameter derived from the four-dilaton amplitude with the a-parameter determined from the Lagrange multiplier method with scale-dependent R-charges. We explicitly demonstrate this equivalence for massive free N=1 superfields and for weakly coupled SQCD.
- Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.
- We propose a deep generative Markov State Model (DeepGenMSM) learning framework for inference of metastable dynamical systems and prediction of trajectories. After unsupervised training on time series data, the model contains (i) a probabilistic encoder that maps from high-dimensional configuration space to a small-sized vector indicating the membership to metastable (long-lived) states, (ii) a Markov chain that governs the transitions between metastable states and facilitates analysis of the long-time dynamics, and (iii) a generative part that samples the conditional distribution of configurations in the next time step. The model can be operated in a recursive fashion to generate trajectories to predict the system evolution from a defined starting state and propose new configurations. The DeepGenMSM is demonstrated to provide accurate estimates of the long-time kinetics and generate valid distributions for molecular dynamics (MD) benchmark systems. Remarkably, we show that DeepGenMSMs are able to make long time-steps in molecular configuration space and generate physically realistic structures in regions that were not seen in training data.
- May 22 2018 cs.CY arXiv:1805.07598v1The article is written to identify the requirements for Open Data Specialist. The ability to use and work with open data affects many areas: sociology, urban studies, geography, statistics, public administration, data journalism, etc. It is especially important to develop and implement training courses on open data for non-IT students. Typically, the specialization of these students contains insufficient number of lessons on working with data and with open data. Students (hereinafter - researchers) feel a great need to eliminate "digital illiteracy" and to master the skills of working with open data. The development of a specialty (i.e. a set of training courses) on open data is designed to solve the problem of lack of knowledge and skills in working with open data. In this paper, the authors attempt to generalize the requirements for an expert in open data and offer an overview of information sources on the topic of hiring such specialists. The authors justify the need to create a specialty on open data for non-core students as well. It is supposed that the specialty will be read in English, a non-native language for students.
- Boosted decision trees enjoy popularity in a variety of applications; however, for large-scale datasets, the cost of training a decision tree in each round can be prohibitively expensive. Inspired by ideas from the multi-arm bandit literature, we develop a highly efficient algorithm for computing exact greedy-optimal decision trees, outperforming the state-of-the-art Quick Boost method. We further develop a framework for deriving lower bounds on the problem that applies to a wide family of conceivable algorithms for the task (including our algorithm and Quick Boost), and we demonstrate empirically on a wide variety of data sets that our algorithm is near-optimal within this family of algorithms. We also derive a lower bound applicable to any algorithm solving the task, and we demonstrate that our algorithm empirically achieves performance close to this best-achievable lower bound.
- May 22 2018 cs.IR arXiv:1805.07591v1This paper presents the Entity-Duet Neural Ranking Model (EDRM), which introduces knowledge graphs to neural search systems. EDRM represents queries and documents by their words and entity annotations. The semantics from knowledge graphs are integrated in the distributed representations of their entities, while the ranking is conducted by interaction-based neural ranking networks. The two components are learned end-to-end, making EDRM a natural combination of entity-oriented search and neural information retrieval. Our experiments on a commercial search log demonstrate the effectiveness of EDRM. Our analyses reveal that knowledge graph semantics significantly improve the generalization ability of neural ranking models.
- May 22 2018 cs.CG arXiv:1805.07589v1Ordinal Embedding places n objects into R^d based on comparisons such as "a is closer to b than c." Current optimization-based approaches suffer from scalability problems and an abundance of low quality local optima. We instead consider a computational geometric approach based on selecting comparisons to discover points close to nearly-orthogonal "axes" and embed the whole set by their projections along each axis. We thus also estimate the dimensionality of the data. Our embeddings are of lower quality than the global optima of optimization-based approaches, but are more scalable computationally and more reliable than local optima often found via optimization. Our method uses \Theta(n d \log n) comparisons and \Theta(n^2 d^2) total operations, and can also be viewed as selecting constraints for an optimizer which, if successful, will produce an almost-perfect embedding for sufficiently dense datasets.
- May 22 2018 math.LO arXiv:1805.07586v1In the present paper, we introduce a multi-type display calculus for dynamic epistemic logic, which we refer to as Dynamic Calculus. The display-approach is suitable to modularly chart the space of dynamic epistemic logics on weaker-than-classical propositional base. The presence of types endows the language of the Dynamic Calculus with additional expressivity, allows for a smooth proof-theoretic treatment, and paves the way towards a general methodology for the design of proof systems for the generality of dynamic logics, and certainly beyond dynamic epistemic logic. We prove that the Dynamic Calculus adequately captures Baltag-Moss-Solecki's dynamic epistemic logic, and enjoys Belnap-style cut elimination.
- We give analogues of the Auslander correspondence for two classes of triangulated categories satisfying certain finiteness conditions. The first class is triangulated categories with additive generators and we consider their endomorphism algebras as the Auslander algebras. For the second one, we introduce the notion of $[1]$-additive generators and consider their graded endormorphism algebras as the Auslander algebras. We give a homological characterization of the Auslander algebras for each class. Along the way, we also show that the algebraic triangle structures on the homotopy categories are unique up to equivalence.
- May 22 2018 cs.CV arXiv:1805.07582v1In single-pixel imaging (SPI), the target object is illuminated with varying patterns sequentially and an intensity sequence is recorded by a single-pixel detector without spatial resolution. A high quality object image can only be computationally reconstructed after a large number of illuminations, with disadvantages of long imaging time and high cost. Conventionally, object classification is performed after a reconstructed object image with good fidelity is available. In this paper, we propose to classify the target object with a small number of illuminations in a fast manner for Fourier SPI. A naive Bayes classifier is employed to classify the target objects based on the single-pixel intensity sequence without any image reconstruction and each sequence element is regarded as an object feature in the classifier. Simulation results demonstrate our proposed scheme can classify the number digit object images with high accuracy (e.g. 80% accuracy using only 13 illuminations, at a sampling ratio of 0.3%).
- May 22 2018 math.CO arXiv:1805.07576v1A pair $(A,B)$ of square $(0,1)$-matrices is called a Lehman pair if $AB^T=J+kI$ for some integer $k\in\{-1,1,2,3,\ldots\}$, and the matrices $A$ and $B$ are called Lehman pair. This terminology arises because Lehman showed that the rows of minimum weight in any non-degenerate minimally nonideal (mni) matrix $M$ form a square Lehman submatrix of $M$. In this paper, we view a Lehman matrix as the bipartite adjacency matrix of a regular bipartite graph, focussing in particular on the case where the graph is cubic. From this perspective, we identify two constructions that generate cubic Lehman graphs from smaller Lehman graphs. The most prolific of these constructions involves repeatedly replacing suitable pairs of edges with a particular $6$-vertex subgraph that we call a $3$-rung ladder segment. Two decades ago, Lütolf and Margot initiated a computational study of mni matrices and constructed a catalogue containing (among other things) a listing of all cubic Lehman matrices with $k =1$ of order up to $17 \times 17$. We verify their catalogue (which has just one omission), and extend the computational results to $20 \times 20$ matrices. Of the $908$ cubic Lehman matrices (with $k=1$) of order up to $20 \times 20$, only two do not arise from our $3$-rung ladder construction. However these exceptions can be derived from our second construction, and so our two constructions cover all known cubic Lehman matrices with $k=1$.
- Syndromic surveillance detects and monitors individual and population health indicators through sources such as emergency department records. Automated classification of these records can improve outbreak detection speed and diagnosis accuracy. Current syndromic systems rely on hand-coded keyword-based methods to parse written fields and may benefit from the use of modern supervised-learning classifier models. In this paper we implement two recurrent neural network models based on long short-term memory (LSTM) and gated recurrent unit (GRU) cells and compare them to two traditional bag-of-words classifiers: multinomial naive Bayes (MNB) and a support vector machine (SVM). All four models are trained to predict diagnostic code groups as defined by Clinical Classification Software, first to predict from discharge diagnosis, then from chief complaint fields. The classifiers are trained on 3.6 million de-identified emergency department records from a single United States jurisdiction. We compare performance of these models primarily using the F1 score. We measure absolute model performance to determine which conditions are the most amenable to surveillance based on chief complaint alone. Using discharge diagnoses, the LSTM classifier performs best, though all models exhibit an F1 score above 0.96. GRU performs best on chief complaints (F1=0.4859) and MNB with bigrams performs worst (F1=0.3998). Certain syndrome types are easier to detect than others. For examples, the GRU predicts alcohol-related disorders well (F1=0.8084) but predicts influenza poorly (F1=0.1363). In all instances the RNN models outperformed the bag-of-word classifiers, suggesting deep learning models could substantially improve the automatic classification of unstructured text for syndromic surveillance.
- May 22 2018 cs.CR arXiv:1805.07570v1Cloning spare parts and entities of mass products is an old and serious unsolved problem for the automotive industry. The economic losses in addition to a loss of know-how and IP theft as well as security and safety threats are huge in all dimensions. This presentation gives an overview of the traditional state of the art on producing clone resistant electronic units in the last two decades. A survey is attempting to demonstrate the techniques so far known as Physically Unclonable Functions PUFs showing their advantages and drawbacks. The necessity for fabricating mechatronic-security in the vehicular environment is emerging to become a vital requirement for new automotive security regulations (legal regulations) in the near future. The automotive industry is facing a challenge to produce low-cost and highly safe and secure networked automotive systems. The emerging networked smart traffic environment is offering new safety services and creating at the same time new needs and threats in a highly networked world. There is a crying need for automotive security that approaches the level of the robust biological security for cars as dominating mobility actors in the modern smart life environment. Possible emerging technologies allowing embedding practical mechatronic-security modules as a low-cost digital alternative are presented. Such digital clone-resistant mechatronic-units (as Electronic Control Units ECUs) may serve as smart security anchors for the automotive environment in the near future. First promising initial results are also presented.
- Making an informed, correct and quick decision can be life-saving. It's crucial for animals during an escape behaviour or for autonomous cars during driving. The decision can be complex and may involve an assessment of the amount of threats present and the nature of each threat. Thus, we should expect early sensory processing to supply classification information fast and accurately, even before relying the information to higher brain areas or more complex system components downstream. Today, advanced convolution artificial neural networks can successfully solve such tasks and are commonly used to build complex decision making systems. However, in order to achieve excellent performance on these tasks they require increasingly complex, "very deep" model structure, which is costly in inference run-time, energy consumption and training samples, only trainable on cloud-computing clusters. A single spiking neuron has been shown to be able to solve many of these required tasks for homogeneous Poisson input statistics, a commonly used model for spiking activity in the neocortex; when modeled as leaky integrate and fire with gradient decent learning algorithm it was shown to posses a wide variety of complex computational capabilities. Here we improve its learning algorithm. We also account for more natural stimulus generated inputs that deviate from this homogeneous Poisson spiking. The improved gradient-based local learning rule allows for significantly better and stable generalization and more efficient performance. We finally apply our model to a problem of multiple instance learning with counting where labels are only available for collections of concepts. In this counting MNIST task the neuron exploits the improved algorithm and succeeds while out performing the previously introduced single neuron learning algorithm as well as conventional ConvNet architecture under similar conditions.
- May 22 2018 cs.CV arXiv:1805.07566v1With the introduction of large-scale datasets and deep learning models capable of learning complex representations, impressive advances have emerged in face detection and recognition tasks. Despite such advances, existing datasets do not capture the difficulty of face recognition in the wildest scenarios, such as hostile disputes or fights. Furthermore, existing datasets do not represent completely unconstrained cases of low resolution, high blur and large pose/occlusion variances. To this end, we introduce the Wildest Faces dataset, which focuses on such adverse effects through violent scenes. The dataset consists of an extensive set of violent scenes of celebrities from movies. Our experimental results demonstrate that state-of-the-art techniques are not well-suited for violent scenes, and therefore, Wildest Faces is likely to stir further interest in face detection and recognition research.
- Clustering is a technique used in network routing to enhance the performance and conserve the network resources. This paper presents a cluster-based routing protocol for VANET utilizing a new addressing scheme in which each node gets an address according to its mobility pattern. Hamming distance technique is used then to partition the network in an address-centric manner. The simulation results show that this protocol enhances routing reachability, whereas reduces routing end-to-end delay and traffic received comparing with two benchmarks namely AODV and DSDV.
- We introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. Instead, it runs many Monte-Carlo simulations guided by reinforcement learning from previous proof attempts. We produce several versions of the prover, parameterized by different learning and guiding algorithms. The strongest version of the system is trained on a large corpus of mathematical problems and evaluated on previously unseen problems. The trained system solves within the same number of inferences over 40% more problems than a baseline prover, which is an unusually high improvement in this hard AI domain. To our knowledge this is the first time reinforcement learning has been convincingly applied to solving general mathematical problems on a large scale.
- In this paper, we propose two new algorithms for transduction with Matrix Completion (MC) problem. The joint MC and prediction tasks are addressed simultaneously to enhance the accuracy, i.e., the label matrix is concatenated to the data matrix forming a stacked matrix. Assuming the data matrix is of low rank, we propose new recommendation methods by posing the problem as a constrained minimization of the Smoothed Rank Function (SRF). We provide convergence analysis for the proposed algorithms. The simulations are conducted on real datasets in two different scenarios of randomly missing pattern with and without block loss. The results confirm that the accuracy of our proposed methods outperforms those of state-of-the-art methods even up to 10% in low observation rates for the scenario without block loss. Our accuracy in the latter scenario, is comparable to state-of-the-art methods while the complexity of the proposed algorithms are reduced up to 4 times.
- May 22 2018 math.OC arXiv:1805.07552v1We present an approach for variational regularization of inverse and imaging problems for recovering functions with values in a set of vectors. We introduce regularization functionals, which are derivative-free double integrals of such functions. These regularization functionals are motivated from double integrals, which approximate Sobolev semi-norms of intensity functions. These were introduced in Bourgain, Brézis and Mironescu, "Another Look at Sobolev Spaces". In: Optimal Control and Partial Differential Equations-Innovations and Applications, IOS press, Amsterdam, 2001. For the proposed regularization functionals we prove existence of minimizers as well as a stability and convergence result for functions with values in a set of vectors.
- May 22 2018 cs.CV arXiv:1805.07550v1Many of the leading approaches for video understanding are data-hungry and time-consuming, failing to capture the gist of spatial-temporal evolution in an efficient manner. The latest research shows that CNN network can reason about static relation of entities in images. To further exploit its capacity in dynamic evolution reasoning, we introduce a novel network module called DenseImage Network(DIN) with two main contributions. 1) A novel compact representation of video which distills its significant spatial-temporal evolution into a matrix called DenseImage, primed for efficient video encoding. 2) A simple yet powerful learning strategy based on DenseImage and a temporal-order-preserving CNN network is proposed for video understanding, which contains a local temporal correlation constraint capturing temporal evolution at multiple time scales with different filter widths. Extensive experiments on two recent challenging benchmarks demonstrate that our DenseImage Network can accurately capture the common spatial-temporal evolution between similar actions, even with enormous visual variations or different time scales. Moreover, we obtain the state-of-the-art results in action and gesture recognition with much less time-and-memory cost, indicating its immense potential in video representing and understanding.
- May 22 2018 cs.CV arXiv:1805.07548v1Deep learning stands at the forefront in many computer vision tasks. However, deep neural networks are usually data-hungry and require a huge amount of well-annotated training samples. Collecting sufficient annotated data is very expensive in many applications, especially for pixel-level prediction tasks such as semantic segmentation. To solve this fundamental issue, we consider a new challenging vision task, Internetly supervised semantic segmentation, which only uses Internet data with noisy image-level supervision of corresponding query keywords for segmentation model training. We address this task by proposing the following solution. A class-specific attention model unifying multiscale forward and backward convolutional features is proposed to provide initial segmentation "ground truth". The model trained with such noisy annotations is then improved by an online fine-tuning procedure. It achieves state-of-the-art performance under the weakly-supervised setting on PASCAL VOC2012 dataset. The proposed framework also paves a new way towards learning from the Internet without human interaction and could serve as a strong baseline therein. Code and data will be released upon the paper acceptance.
- May 22 2018 cs.AI arXiv:1805.07547v1A parameterized skill is a mapping from multiple goals/task parameters to the policy parameters to accomplish them. Existing works in the literature show how a parameterized skill can be learned given a task space that defines all the possible achievable goals. In this work, we focus on tasks defined in terms of final states (goals), and we face on the challenge where the agent aims to autonomously acquire a parameterized skill to manipulate an initially unknown environment. In this case, the task space is not known a priori and the agent has to autonomously discover it. The agent may posit as a task space its whole sensory space (i.e. the space of all possible sensor readings) as the achievable goals will certainly be a subset of this space. However, the space of achievable goals may be a very tiny subspace in relation to the whole sensory space, thus directly using the sensor space as task space exposes the agent to the curse of dimensionality and makes existing autonomous skill acquisition algorithms inefficient. In this work we present an algorithm that actively discovers the manifold of the achievable goals within the sensor space. We validate the algorithm by employing it in multiple different simulated scenarios where the agent actions achieve different types of goals: moving a redundant arm, pushing an object, and changing the color of an object.
- May 22 2018 cs.CV arXiv:1805.07545v1Imitation learning for end-to-end autonomous driving has drawn attention from academic communities. Current methods either only use images as the input which is ambiguous when a car approaches an intersection, or use additional command information to navigate the vehicle but not automated enough. Focusing on making the vehicle drive along the given path, we propose a new navigation command that does not require human's participation and a novel model architecture called angle branched network. Both the new navigation command and the angle branched network are easy to understand and effective. Besides, we find that not only segmentation information but also depth information can boost the performance of the driving model. We conduct experiments in a 3D urban simulator and both qualitative and quantitative evaluation results show the effectiveness of our model.
- Network embeddings map the nodes of a given network into $d$-dimensional Euclidean space $\mathbb{R}^d$. Ideally, this mapping is such that `similar' nodes are mapped onto nearby points, such that the embedding can be used for purposes such as link prediction (if `similar' means being `more likely to be connected') or classification (if `similar' means `being more likely to have the same label'). In recent years various methods for network embedding have been introduced. These methods all follow a similar strategy, defining a notion of similarity between nodes (typically deeming nodes more similar if they are nearby in the network in some metric), a distance measure in the embedding space, and minimizing a loss function that penalizes large distances for similar nodes or small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties, such as (approximate) multipartiteness, certain degree distributions, or certain kinds of assortativity. Overcoming this difficulty, we introduce a conceptual innovation to the literature on network embedding, proposing to create embeddings that maximally add information with respect to such structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. Finally, we demonstrate that the combination of information such structural properties and a Euclidean embedding provides superior performance across a range of link prediction tasks. Moreover, we demonstrate the potential of our approach for network visualization.
- Multitask learning has shown promising performance in many applications and many multitask models have been proposed. In order to identify an effective multitask model for a given multitask problem, we propose a learning framework called learning to multitask (L2MT). To achieve the goal, L2MT exploits historical multitask experience which is organized as a training set consists of several tuples, each of which contains a multitask problem with multiple tasks, a multitask model, and the relative test error. Based on such training set, L2MT first uses a proposed layerwise graph neural network to learn task embeddings for all the tasks in a multitask problem and then learns an estimation function to estimate the relative test error based on task embeddings and the representation of the multitask model based on a unified formulation. Given a new multitask problem, the estimation function is used to identify a suitable multitask model. Experiments on benchmark datasets show the effectiveness of the proposed L2MT framework.
- May 22 2018 math.NA arXiv:1805.07537v1This paper studies the unconditional strong convergence rate for a fully discrete scheme of semilinear stochastic evolution equations, under a generalized Lipschitz-type condition on both drift and diffusion operators. Applied to the one-dimensional stochastic advection-diffusion-reaction equation with multiplicative white noise, the main theorem shows that the spatial and temporal strong convergence orders are $1/2$ and $1/4$, respectively. This is the first optimal strong approximation result for semilinear SPDEs with gradient term driven by non-trace class noises. Numerical tests are performed to verify theoretical analysis.
- May 22 2018 cs.NE arXiv:1805.07531v1We consider artificial neurons which will update their weight coefficients with internal rule based on backpropagation, rather than using it as an external training procedure. To achieve this we include the backpropagation error estimate as a separate entity in all the neuron models and perform its exchange along the synaptic connections. In addition to this we add some special type of neurons with reference inputs, which will serve as a base source of error estimates for the whole network. Finally, we introduce a training control signal for all the neurons, which can enable the correction of weights and the exchange of error estimates. For recurrent neural networks we also demonstrate how to include backpropagation through time into their formalism with the help of some stack memory for reference inputs and external data inputs of neurons. As a useful consequence, our approach enables us to introduce neural networks with the adjustment of synaptic connections, tied to the incorporated backpropagation. Also, for widely used neural networks, such as long short-term memory, radial basis function networks, multilayer perceptrons and convolutional neural networks we demonstrate their alternative description within the framework of our new formalism.
- May 22 2018 math.NT arXiv:1805.07530v1A dessin d'enfant, or dessin, is a bicolored graph embedded into a Riemann surface. Acyclic dessins can be described analytically by pre-images of certain polynomials, called Shabat polynomials, and also algebraically by their monodromy groups, that is, the group generated by rotations of edges about black and white vertices. In this paper we investigate the Shabat polynomials and monodromy groups of planar acyclic dessins that are uniquely determined by their passports.
- May 22 2018 cs.CV arXiv:1805.07526v1Inspired by "predictive coding" - a theory in neuroscience, we develop a bi-directional and dynamical neural network with local recurrent processing, namely predictive coding network (PCN). Unlike any feedforward-only convolutional neural network, PCN includes both feedback connections, which carry top-down predictions, and feedforward connections, which carry bottom-up errors of prediction. Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors. When unfolded over time, the recurrent processing gives rise to an increasingly deeper hierarchy of non-linear transformation, allowing a shallow network to dynamically extend itself into an arbitrarily deep network. We train and test PCN for image classification with SVHN, CIFAR and ImageNet datasets. Despite notably fewer layers and parameters, PCN achieves competitive performance compared to classical and state-of-the-art models. Further analysis shows that the internal representations in PCN converge over time and yield increasingly better accuracy in object recognition. Errors of top-down prediction also map visual saliency or bottom-up attention. This work takes us one step closer to bridging human and machine intelligence in vision.
- We have obtained an integral representation of the shallow neural network that attains the global minimum of its backpropagation (BP) training problem. According to our unpublished numerical simulations conducted several years prior to this study, we had noticed that such an integral representation may exist, but it was not proven until today. First, we introduced a Hilbert space of coefficient functions, and a reproducing kernel Hilbert space (RKHS) of hypotheses, associated with the integral representation. The RKHS reflects the approximation ability of neural networks. Second, we established the ridgelet analysis on RKHS. The analytic property of the integral representation is remarkably clear. Third, we reformulated the BP training as the optimization problem in the space of coefficient functions, and obtained a formal expression of the unique global minimizer, according to the Tikhonov regularization theory. Finally, we demonstrated that the global minimizer is the shrink ridgelet transform. Since the relation between an integral representation and an ordinary finite network is not clear, and BP is convex in the integral representation, we cannot immediately answer the question such as "Is a local minimum a global minimum?" However, the obtained integral representation provides an explicit expression of the global minimizer, without linearity-like assumptions, such as partial linearity and monotonicity. Furthermore, it indicates that the ordinary ridgelet transform provides the minimum norm solution to the original training equation.