Detectability is a basic property of dynamic systems: when it holds one can use the observed output signal produced by a system to reconstruct its current state. In this paper, we consider properties of this type in the framework of discrete event systems modeled by Petri nets (a.k.a. place/transition nets). We first study weak detectability and weak approximate detectability. The former implies that there exists an evolution of the net such that all corresponding observed output sequences longer than a given value allow one to reconstruct the current marking (i.e., state). The latter implies that there exists an evolution of the net such that all corresponding observed output sequences longer than a given value allow one to determine if the current marking belongs to a given set. We show that the problem of verifying the first property for labeled place/transition nets with inhibitor arcs and the problem of verifying the second property for labeled place/transition nets are both undecidable. We also consider a property called instant strong detectability which implies that for all possible evolutions the corresponding observed output sequence allows one to reconstruct the current marking. We show that the problem of verifying this property for labeled place/transition nets is decidable while its inverse problem is EXPSPACE-hard.
In this paper, we propose several opacity-preserving (bi)simulation relations for general nondeterministic transition systems (NTS) in terms of initial-state opacity, current-state opacity, K-step opacity, and infinite-step opacity. We also show how one can leverage quotient construction to compute such relations. In addition, we use a two-way observer method to verify opacity of nondeterministic finite transition systems (NFTSs). As a result, although the verification of opacity for infinite NTSs is generally undecidable, if one can find such an opacity-preserving relation from an infinite NTS to an NFTS, the (lack of) opacity of the NTS can be easily verified over the NFTS which is decidable.
Feb 06 2018 cs.AI
This paper describes a problem arising in sea exploration, where the aim is to schedule the expedition of a ship for collecting information about the resources on the seafloor. The aim is to collect data by probing on a set of carefully chosen locations, so that the information available is optimally enriched. This problem has similarities with the orienteering problem, where the aim is to plan a time-limited trip for visiting a set of vertices, collecting a prize at each of them, in such a way that the total value collected is maximum. In our problem, the score at each vertex is associated with an estimation of the level of the resource on the given surface, which is done by regression using Gaussian processes. Hence, there is a correlation among scores on the selected vertices; this is a first difference with respect to the standard orienteering problem. The second difference is the location of each vertex, which in our problem is a freely chosen point on a given surface.
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.
It has been shown that rules can be extracted from highly non-linear, recursive models such as recurrent neural networks (RNNs). The RNN models mostly investigated include both Elman networks and second-order recurrent networks. Recently, new types of RNNs have demonstrated superior power in handling many machine learning tasks, especially when structural data is involved such as language modeling. Here, we empirically evaluate different recurrent models on the task of learning deterministic finite automata (DFA), the seven Tomita grammars. We are interested in the capability of recurrent models with different architectures in learning and expressing regular grammars, which can be the building blocks for many applications dealing with structural data. Our experiments show that a second-order RNN provides the best and stablest performance of extracting DFA over all Tomita grammars and that other RNN models are greatly influenced by different Tomita grammars. To better understand these results, we provide a theoretical analysis of the "complexity" of different grammars, by introducing the entropy and the averaged edit distance of regular grammars defined in this paper. Through our analysis, we categorize all Tomita grammars into different classes, which explains the inconsistency in the performance of extraction observed across all RNN models.
Face authentication systems are becoming increasingly prevalent, especially with the rapid development of Deep Learning technologies. However, human facial information is easy to be captured and reproduced, which makes face authentication systems vulnerable to various attacks. Liveness detection is an important defense technique to prevent such attacks, but existing solutions did not provide clear and strong security guarantees, especially in terms of time. To overcome these limitations, we propose a new liveness detection protocol called Face Flashing that significantly increases the bar for launching successful attacks on face authentication systems. By randomly flashing well-designed pictures on a screen and analyzing the reflected light, our protocol has leveraged physical characteristics of human faces: reflection processing at the speed of light, unique textual features, and uneven 3D shapes. Cooperating with working mechanism of the screen and digital cameras, our protocol is able to detect subtle traces left by an attacking process. To demonstrate the effectiveness of Face Flashing, we implemented a prototype and performed thorough evaluations with large data set collected from real-world scenarios. The results show that our Timing Verification can effectively detect the time gap between legitimate authentications and malicious cases. Our Face Verification can also differentiate 2D plane from 3D objects accurately. The overall accuracy of our liveness detection system is 98.8\%, and its robustness was evaluated in different scenarios. In the worst case, our system's accuracy decreased to a still-high 97.3\%.
Jan 08 2018 cs.CR
In this paper, we seek to better understand Android obfuscation and depict a holistic view of the usage of obfuscation through a large-scale investigation in the wild. In particular, we focus on four popular obfuscation approaches: identifier renaming, string encryption, Java reflection, and packing. To obtain the meaningful statistical results, we designed efficient and lightweight detection models for each obfuscation technique and applied them to our massive APK datasets (collected from Google Play, multiple third-party markets, and malware databases). We have learned several interesting facts from the result. For example, malware authors use string encryption more frequently, and more apps on third-party markets than Google Play are packed. We are also interested in the explanation of each finding. Therefore we carry out in-depth code analysis on some Android apps after sampling. We believe our study will help developers select the most suitable obfuscation approach, and in the meantime help researchers improve code analysis systems in the right direction.
Dec 19 2017 cs.CV
Recent years have witnessed the unprecedented success of deep convolutional neural networks (CNNs) in single image super-resolution (SISR). However, existing CNN-based SISR methods mostly assume that a low-resolution (LR) image is bicubicly downsampled from a high-resolution (HR) image, thus inevitably giving rise to poor performance when the true degradation does not follow this assumption. Moreover, they lack scalability in learning a single model to deal with multiple degradations. To address these issues, we propose a dimensionality stretching strategy that enables a single convolutional super-resolution network to take two key factors of the SISR degradation process, i.e., blur kernel and noise level, as input. Consequently, the proposed super-resolver can handle multiple and even spatially variant degradations, which significantly improves the practicability. Extensive experimental results on synthetic and real LR images show that the proposed convolutional super-resolution network not only can produce favorable results on multiple degradations but also is computationally efficient, providing a highly effective and scalable solution to practical SISR applications.
In recent years, wireless networks are undergoing the paradigm transformation from high-speed data rate to connecting everything, raising the development of Internet of things (IoT). As a typical scenario of IoT, vehicular networks have attracted extensive attentions from automotive and telecommunication industry, especially for ultra-reliable and low latency communication (URLLC) applications. However, the conventional wireless networks are mainly designed for promoting spectral efficiency, without paying sufficient attention to the URLLC requirements. To this end, this paper aims at investigating latency and reliability in vehicular networks. Specifically, we first present promising URLLC applications and requirements. Several performance evaluation methods are subsequently discussed from the viewpoints of physical and media access control layers. Taking advantage of these methods, the underlying transmission techniques and scheduling algorithms can be assessed efficiently and accurately. Finally, some promising solutions are presented in order to tackle potential challenges.
This paper is concerned with how to make efficient use of social information to improve recommendations. Most existing social recommender systems assume people share similar preferences with their social friends. Which, however, may not hold true due to various motivations of making online friends and dynamics of online social networks. Inspired by recent causal process based recommendations that first model user exposures towards items and then use these exposures to guide rating prediction, we utilize social information to capture user exposures rather than user preferences. We assume that people get information of products from their online friends and they do not have to share similar preferences, which is less restrictive and seems closer to reality. Under this new assumption, in this paper, we present a novel recommendation approach (named SERec) to integrate social exposure into collaborative filtering. We propose two methods to implement SERec, namely social regularization and social boosting, each with different ways to construct social exposures. Experiments on four real-world datasets demonstrate that our methods outperform the state-of-the-art methods on top-N recommendations. Further study compares the robustness and scalability of the two proposed methods.
Applying machine learning in the health care domain has shown promising results in recent years. Interpretable outputs from learning algorithms are desirable for decision making by health care personnel. In this work, we explore the possibility of utilizing causal relationships to refine diagnostic prediction. We focus on the task of diagnostic prediction using discomfort drawings, and explore two ways to employ causal identification to improve the diagnostic results. Firstly, we use causal identification to infer the causal relationships among diagnostic labels which, by itself, provides interpretable results to aid the decision making and training of health care personnel. Secondly, we suggest a post-processing approach where the inferred causal relationships are used to refine the prediction accuracy of a multi-view probabilistic model. Experimental results show firstly that causal identification is capable of detecting the causal relationships among diagnostic labels correctly, and secondly that there is potential for improving pain diagnostics prediction accuracy using the causal relationships.
Nov 21 2017 cs.CV
Imaging objects that are obscured by scattering and occlusion is an important challenge for many applications. For example, navigation and mapping capabilities of autonomous vehicles could be improved, vision in harsh weather conditions or under water could be facilitated, or search and rescue scenarios could become more effective. Unfortunately, conventional cameras cannot see around corners. Emerging, time-resolved computational imaging systems, however, have demonstrated first steps towards non-line-of-sight (NLOS) imaging. In this paper, we develop an algorithmic framework for NLOS imaging that is robust to partial occlusions within the hidden scenes. This is a common light transport effect, but not adequately handled by existing NLOS reconstruction algorithms, resulting in fundamental limitations in what types of scenes can be recovered. We demonstrate state-of-the-art NLOS reconstructions in simulation and with a prototype single photon avalanche diode (SPAD) based acquisition system.
Oct 12 2017 cs.CV
Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising. However, these methods mostly learn a specific model for each noise level, and require multiple models for denoising images with different noise levels. They also lack flexibility to deal with spatially variant noise, limiting their applications in practical denoising. To address these issues, we present a fast and flexible denoising convolutional neural network, namely FFDNet, with a tunable noise level map as the input. The proposed FFDNet works on downsampled sub-images to speed up the inference, and adopts orthogonal regularization to enhance the generalization ability. In contrast to the existing discriminative denoisers, FFDNet enjoys several desirable properties, including (i) the ability to handle a wide range of noise levels (i.e., [0, 75]) effectively with a single network, (ii) the ability to remove spatially variant noise by specifying a non-uniform noise level map, and (iii) faster speed than benchmark BM3D even on CPU without sacrificing denoising performance. Extensive experiments on synthetic and real noisy images are conducted to evaluate FFDNet in comparison with state-of-the-art denoisers. The results show that FFDNet is effective and efficient, making it highly attractive for practical denoising applications.
Oct 10 2017 cs.CV
Automatically predicting age group and gender from face images acquired in unconstrained conditions is an important and challenging task in many real-world applications. Nevertheless, the conventional methods with manually-designed features on in-the-wild benchmarks are unsatisfactory because of incompetency to tackle large variations in unconstrained images. This difficulty is alleviated to some degree through Convolutional Neural Networks (CNN) for its powerful feature representation. In this paper, we propose a new CNN based method for age group and gender estimation leveraging Residual Networks of Residual Networks (RoR), which exhibits better optimization ability for age group and gender classification than other CNN architectures.Moreover, two modest mechanisms based on observation of the characteristics of age group are presented to further improve the performance of age estimation.In order to further improve the performance and alleviate over-fitting problem, RoR model is pre-trained on ImageNet firstly, and then it is fune-tuned on the IMDB-WIKI-101 data set for further learning the features of face images, finally, it is used to fine-tune on Adience data set. Our experiments illustrate the effectiveness of RoR method for age and gender estimation in the wild, where it achieves better performance than other CNN methods. Finally, the RoR-152+IMDB-WIKI-101 with two mechanisms achieves new state-of-the-art results on Adience benchmark.
Oct 03 2017 cs.CV
The Residual Networks of Residual Networks (RoR) exhibits excellent performance in the image classification task, but sharply increasing the number of feature map channels makes the characteristic information transmission incoherent, which losses a certain of information related to classification prediction, limiting the classification performance. In this paper, a Pyramidal RoR network model is proposed by analysing the performance characteristics of RoR and combining with the PyramidNet. Firstly, based on RoR, the Pyramidal RoR network model with channels gradually increasing is designed. Secondly, we analysed the effect of different residual block structures on performance, and chosen the residual block structure which best favoured the classification performance. Finally, we add an important principle to further optimize Pyramidal RoR networks, drop-path is used to avoid over-fitting and save training time. In this paper, image classification experiments were performed on CIFAR-10/100 and SVHN datasets, and we achieved the current lowest classification error rates were 2.96%, 16.40% and 1.59%, respectively. Experiments show that the Pyramidal RoR network optimization method can improve the network performance for different data sets and effectively suppress the gradient disappearance problem in DCNN training.
Oct 02 2017 cs.LG
Rule extraction from black-box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly non-linear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second order recurrent neural networks trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases the rules outperform the trained RNNs.
Understanding human mobility is critical for decision support in areas from urban planning to infectious diseases control. Prior work has focused on tracking daily logs of outdoor mobility without considering relevant context, which contain a mixture of regular and irregular human movement for a range of purposes, and thus diverse effects on the dynamics have been ignored. This study aims to focus on irregular human movement of different meta-populations with various purposes. We propose approaches to estimate the predictability of mobility in different contexts. With our survey data from international and domestic visitors to Australia, we found that the travel patterns of Europeans visiting for holidays are less predictable than those visiting for education, while East Asian visitors show the opposite patterns, ie, more predictable for holidays than for education. Domestic residents from the most populous Australian states exhibit the most unpredictable patterns, while visitors from less populated states show the highest predictable movement.
Device-to-device (D2D) communications recently have attracted much attention for its potential capability to improve spectral efficiency underlaying the existing heterogeneous networks (HetNets). Due to no sophisticated control, D2D user equipments (DUEs) themselves cannot resist eavesdropping or security attacks. It is urgent to maximize the secure capacity for both cellular users and DUEs. This paper formulates the radio resource allocation problem to maximize the secure capacity of DUEs for the D2D communication underlaying HetNets which consist of high power nodes and low power nodes. The optimization objective function with transmit bit rate and power constraints, which is non-convex and hard to be directly derived, is firstly transformed into matrix form. Then the equivalent convex form of the optimization problem is derived according to the Perron-Frobenius theory. A heuristic iterative algorithm based on the proximal theory is proposed to solve this equivalent convex problem through evaluating the proximal operator of Lagrange function. Numerical results show that the proposed radio resource allocation solution significantly improves the secure capacity with a fast convergence speed.
As a promising paradigm for the fifth generation wireless communication (5G) system, the fog radio access network (F-RAN) has been proposed as an advanced socially-aware mobile networking architecture to provide high spectral efficiency (SE) while maintaining high energy efficiency (EE) and low latency. Recent advents are advocated to the performance analysis and radio resource allocation, both of which are fundamental issues to make F-RANs successfully rollout. This article comprehensively summarizes the recent advances of the performance analysis and radio resource allocation in F-RANs. Particularly, the advanced edge cache and adaptive model selection schemes are presented to improve SE and EE under maintaining a low latency level. The radio resource allocation strategies to optimize SE and EE in F-RANs are respectively proposed. A few open issues in terms of the F-RAN based 5G architecture and the social-awareness technique are identified as well.
Negative afterimage appears in our vision when we shift our gaze from an over stimulated original image to a new area with a uniform color. The colors of negative afterimages differ from the old stimulating colors in the original image when the color in the new area is either neutral or chromatic. The interaction between stimulating colors in the test and inducing field in the original image changes our color perception due to simultaneous contrast, and the interaction between changed colors perceived in the previously-viewed field and the color in the currently-viewed field also affects our perception of colors in negative afterimages due to successive contrast. Based on these observations we propose a computational model to estimate colors of negative afterimages in more general cases where the original stimulating color in the test field is chromatic, and the original stimulating color in the inducing field and the new stimulating color can be either neutral or chromatic. We validate our model with human experiments.
In this paper, we investigate multi-message authentication to combat adversaries with infinite computational capacity. An authentication framework over a wiretap channel $(W_1,W_2)$ is proposed to achieve information-theoretic security with the same key. The proposed framework bridges the two research areas in physical (PHY) layer security: secure transmission and message authentication. Specifically, the sender Alice first transmits message $M$ to the receiver Bob over $(W_1,W_2)$ with an error correction code; then Alice employs a hash function (i.e., $\varepsilon$-AWU$_2$ hash functions) to generate a message tag $S$ of message $M$ using key $K$, and encodes $S$ to a codeword $X^n$ by leveraging an existing strongly secure channel coding with exponentially small (in code length $n$) average probability of error; finally, Alice sends $X^n$ over $(W_1,W_2)$ to Bob who authenticates the received messages. We develop a theorem regarding the requirements/conditions for the authentication framework to be information-theoretic secure for authenticating a polynomial number of messages in terms of $n$. Based on this theorem, we propose an authentication protocol that can guarantee the security requirements, and prove its authentication rate can approach infinity when $n$ goes to infinity. Furthermore, we design and implement an efficient and feasible authentication protocol over binary symmetric wiretap channel (BSWC) by using \emphLinear Feedback Shifting Register based (LFSR-based) hash functions and strong secure polar code. Through extensive experiments, it is demonstrated that the proposed protocol can achieve low time cost, high authentication rate, and low authentication error rate.
Jul 20 2017 cs.CV
Identity transformations, used as skip-connections in residual networks, directly connect convolutional layers close to the input and those close to the output in deep neural networks, improving information flow and thus easing the training. In this paper, we introduce two alternative linear transforms, orthogonal transformation and idempotent transformation. According to the definition and property of orthogonal and idempotent matrices, the product of multiple orthogonal (same idempotent) matrices, used to form linear transformations, is equal to a single orthogonal (idempotent) matrix, resulting in that information flow is improved and the training is eased. One interesting point is that the success essentially stems from feature reuse and gradient reuse in forward and backward propagation for maintaining the information during flow and eliminating the gradient vanishing problem because of the express way through skip-connections. We empirically demonstrate the effectiveness of the proposed two transformations: similar performance in single-branch networks and even superior in multi-branch networks in comparison to identity transformations.
While autoencoders are a key technique in representation learning for continuous structures, such as images or wave forms, developing general-purpose autoencoders for discrete structures, such as text sequence or discretized images, has proven to be more challenging. In particular, discrete inputs make it more difficult to learn a smooth encoder that preserves the complex local relationships in the input space. In this work, we propose an adversarially regularized autoencoder (ARAE) with the goal of learning more robust discrete-space representations. ARAE jointly trains both a rich discrete-space encoder, such as an RNN, and a simpler continuous space generator function, while using generative adversarial network (GAN) training to constrain the distributions to be similar. This method yields a smoother contracted code space that maps similar inputs to nearby codes, and also an implicit latent variable GAN model for generation. Experiments on text and discretized images demonstrate that the GAN model produces clean interpolations and captures the multimodality of the original space, and that the autoencoder produces improve- ments in semi-supervised learning as well as state-of-the-art results in unaligned text style transfer task using only a shared continuous-space representation.
Measurement error in the observed values of the variables can greatly change the output of various causal discovery methods. This problem has received much attention in multiple fields, but it is not clear to what extent the causal model for the measurement-error-free variables can be identified in the presence of measurement error with unknown variance. In this paper, we study precise sufficient identifiability conditions for the measurement-error-free causal model and show what information of the causal model can be recovered from observed data. In particular, we present two different sets of identifiability conditions, based on the second-order statistics and higher-order statistics of the data, respectively. The former was inspired by the relationship between the generating model of the measurement-error-contaminated data and the factor analysis model, and the latter makes use of the identifiability result of the over-complete independent component analysis problem.
We study causal inference in a multi-environment setting, in which the functional relations for producing the variables from their direct causes remain the same across environments, while the distribution of exogenous noises may vary. We introduce the idea of using the invariance of the functional relations of the variables to their causes across a set of environments. We define a notion of completeness for a causal inference algorithm in this setting and prove the existence of such algorithm by proposing the baseline algorithm. Additionally, we present an alternate algorithm that has significantly improved computational and sample complexity compared to the baseline algorithm. The experiment results show that the proposed algorithm outperforms the other existing algorithms.
It is oftentimes impossible to understand how machine learning models reach a decision. While recent research has proposed various technical approaches to provide some clues as to how a learning model makes individual decisions, they cannot provide users with ability to inspect a learning model as a complete entity. In this work, we propose a new technical approach that augments a Bayesian regression mixture model with multiple elastic nets. Using the enhanced mixture model, we extract explanations for a target model through global approximation. To demonstrate the utility of our approach, we evaluate it on different learning models covering the tasks of text mining and image recognition. Our results indicate that the proposed approach not only outperforms the state-of-the-art technique in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of a learning model.
In order to characterize the fundamental limit of the tradeoff between the amount of cache memory and the delivery transmission rate of multiuser caching systems, various coding schemes have been proposed in the literature. These schemes can largely be categorized into two classes, namely uncoded prefetching schemes and coded prefetching schemes. While uncoded prefetching schemes in general offer order-wise optimal performance, coded prefetching schemes often have better performance at the low cache memory regime. The significant differences in the coding components between the two classes may leave the impression that they are largely unrelated. In this work, we provide a connection between the uncoded prefetching scheme proposed by Maddah Ali and Niesen (and its improved version by Yu et al) and the coded prefetching scheme proposed by Tian and Chen. A critical observation is first given where a coding component in the Tian-Chen scheme can be replaced by a binary .code, which enables us to view the two schemes as the extremes of a more general scheme. An explicit example is given to show that the intermediate operating points of this general scheme can in fact provide new memory-rate tradeoff points previously not known to be achievable in the literature. This new general coding scheme is then presented and analyzed rigorously, which yields a new inner bound to the memory-rate tradeoff for the caching problem.
Apr 12 2017 cs.CV
Model-based optimization methods and discriminative learning methods have been the two dominant strategies for solving various inverse problems in low-level vision. Typically, those two kinds of methods have their respective merits and drawbacks, e.g., model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming with sophisticated priors for the purpose of good performance; in the meanwhile, discriminative learning methods have fast testing speed but their application range is greatly restricted by the specialized task. Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e.g., deblurring). Such an integration induces considerable advantage when the denoiser is obtained via discriminative learning. However, the study of integration with fast discriminative denoiser prior is still lacking. To this end, this paper aims to train a set of fast and effective CNN (convolutional neural network) denoisers and integrate them into model-based optimization method to solve other inverse problems. Experimental results demonstrate that the learned set of denoisers not only achieve promising Gaussian denoising results but also can be used as prior to deliver good performance for various low-level vision applications.
Measuring conditional dependencies among the variables of a network is of great interest to many disciplines. This paper studies some shortcomings of the existing dependency measures in detecting direct causal influences or their lack of ability for group selection to capture strong dependencies and accordingly introduces a new statistical dependency measure to overcome them. This measure is inspired by Dobrushin's coefficients and based on the fact that there is no dependency between $X$ and $Y$ given another variable $Z$, if and only if the conditional distribution of $Y$ given $X=x$ and $Z=z$ does not change when $X$ takes another realization $x'$ while $Z$ takes the same realization $z$. We show the advantages of this measure over the related measures in the literature. Moreover, we establish the connection between our measure and the integral probability metric (IPM) that helps to develop estimators of the measure with lower complexity compared to other relevant information theoretic based measures. Finally, we show the performance of this measure through numerical simulations.
Mar 30 2017 cs.CR
Inspired by the boom of the consumer IoT market, many device manufacturers, start-up companies and technology giants have jumped into the space. Unfortunately, the exciting utility and rapid marketization of IoT, come at the expense of privacy and security. Industry reports and academic work have revealed many attacks on IoT systems, resulting in privacy leakage, property loss and large-scale availability problems. To mitigate such threats, a few solutions have been proposed. However, it is still less clear what are the impacts they can have on the IoT ecosystem. In this work, we aim to perform a comprehensive study on reported attacks and defenses in the realm of IoT aiming to find out what we know, where the current studies fall short and how to move forward. To this end, we first build a toolkit that searches through massive amount of online data using semantic analysis to identify over 3000 IoT-related articles. Further, by clustering such collected data using machine learning technologies, we are able to compare academic views with the findings from industry and other sources, in an attempt to understand the gaps between them, the trend of the IoT security risks and new problems that need further attention. We systemize this process, by proposing a taxonomy for the IoT ecosystem and organizing IoT security into five problem areas. We use this taxonomy as a beacon to assess each IoT work across a number of properties we define. Our assessment reveals that relevant security and privacy problems are far from solved. We discuss how each proposed solution can be applied to a problem area and highlight their strengths, assumptions and constraints. We stress the need for a security framework for IoT vendors and discuss the trend of shifting security liability to external or centralized entities. We also identify open research problems and provide suggestions towards a secure IoT ecosystem.
Mar 27 2017 cs.CV
Recent years have witnessed great success of convolutional neural network (CNN) for various problems both in low and high level visions. Especially noteworthy is the residual network which was originally proposed to handle high-level vision problems and enjoys several merits. This paper aims to extend the merits of residual network, such as skip connection induced fast training, for a typical low-level vision problem, i.e., single image super-resolution. In general, the two main challenges of existing deep CNN for supper-resolution lie in the gradient exploding/vanishing problem and large numbers of parameters or computational cost as CNN goes deeper. Correspondingly, the skip connections or identity mapping shortcuts are utilized to avoid gradient exploding/vanishing problem. In addition, the skip connections have naturally centered the activation which led to better performance. To tackle with the second problem, a lightweight CNN architecture which has carefully designed width, depth and skip connections was proposed. In particular, a strategy of gradually varying the shape of network has been proposed for residual network. Different residual architectures for image super-resolution have also been compared. Experimental results have demonstrated that the proposed CNN model can not only achieve state-of-the-art PSNR and SSIM results for single image super-resolution but also produce visually pleasant results. This paper has extended the mmm 2017 oral conference paper with a considerable new analyses and more experiments especially from the perspective of centering activations and ensemble behaviors of residual network.
Mar 07 2017 cs.CL
Computation of semantic similarity between concepts is an important foundation for many research works. This paper focuses on IC computing methods and IC measures, which estimate the semantic similarities between concepts by exploiting the topological parameters of the taxonomy. Based on analyzing representative IC computing methods and typical semantic similarity measures, we propose a new hybrid IC computing method. Through adopting the parameter dhyp and lch, we utilize the new IC computing method and propose a novel comprehensive measure of semantic similarity between concepts. An experiment based on WordNet "is a" taxonomy has been designed to test representative measures and our measure on benchmark dataset R&G, and the results show that our measure can obviously improve the similarity accuracy. We evaluate the proposed approach by comparing the correlation coefficients between five measures and the artificial data. The results show that our proposal outperforms the previous measures.
We study the problem of learning the support of transition matrix between random processes in a Vector Autoregressive (VAR) model from samples when a subset of the processes are latent. It is well known that ignoring the effect of the latent processes may lead to very different estimates of the influences among observed processes, and we are concerned with identifying the influences among the observed processes, those between the latent ones, and those from the latent to the observed ones. We show that the support of transition matrix among the observed processes and lengths of all latent paths between any two observed processes can be identified successfully under some conditions on the VAR model. From the lengths of latent paths, we reconstruct the latent subgraph (representing the influences among the latent processes) with a minimum number of variables uniquely if its topology is a directed tree. Furthermore, we propose an algorithm that finds all possible minimal latent graphs under some conditions on the lengths of latent paths. Our results apply to both non-Gaussian and Gaussian cases, and experimental results on various synthetic and real-world datasets validate our theoretical results.
We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemble what have been found in the visual cortex. Our result suggests that the diversity or heterogeneity of tuning curve shapes as observed in neurophysiological experiment might actually constitute an optimal population representation of visual motions with multiple components.
Dec 28 2016 cs.CV
Video object segmentation is challenging due to the factors like rapidly fast motion, cluttered backgrounds, arbitrary object appearance variation and shape deformation. Most existing methods only explore appearance information between two consecutive frames, which do not make full use of the usefully long-term nonlocal information that is helpful to make the learned appearance stable, and hence they tend to fail when the targets suffer from large viewpoint changes and significant non-rigid deformations. In this paper, we propose a simple yet effective approach to mine the long-term sptatio-temporally nonlocal appearance information for unsupervised video segmentation. The motivation of our algorithm comes from the spatio-temporal nonlocality of the region appearance reoccurrence in a video. Specifically, we first generate a set of superpixels to represent the foreground and background, and then update the appearance of each superpixel with its long-term sptatio-temporally nonlocal counterparts generated by the approximate nearest neighbor search method with the efficient KD-tree algorithm. Then, with the updated appearances, we formulate a spatio-temporal graphical model comprised of the superpixel label consistency potentials. Finally, we generate the segmentation by optimizing the graphical model via iteratively updating the appearance model and estimating the labels. Extensive evaluations on the SegTrack and Youtube-Objects datasets demonstrate the effectiveness of the proposed method, which performs favorably against some state-of-art methods.
Dec 23 2016 cs.CR
Given a large number of low-level heterogeneous categorical alerts from an anomaly detection system, how to characterize complex relationships between different alerts, filter out false positives, and deliver trustworthy rankings and suggestions to end users? This problem is motivated by and generalized from applications in enterprise security and attack scenario reconstruction. While existing techniques focus on either reconstructing abnormal scenarios or filtering out false positive alerts, it can be more advantageous to consider the two perspectives simultaneously in order to improve detection accuracy and better understand anomaly behaviors. In this paper, we propose CAR, a collaborative alerts ranking framework that exploits both temporal and content correlations from heterogeneous categorical alerts. CAR first builds a tree-based model to capture both short-term correlations and long-term dependencies in each alert sequence, which identifies abnormal action sequences. Then, an embedding-based model is employed to learn the content correlations between alerts via their heterogeneous categorical attributes. Finally, by incorporating both temporal and content dependencies into one optimization framework, CAR ranks both alerts and their corresponding alert patterns. Our experiments, using real-world enterprise monitoring data and real attacks launched by professional hackers, show that CAR can accurately identify true positive alerts and successfully reconstruct attack scenarios at the same time.
We consider the conjecture by Aichholzer, Aurenhammer, Hurtado, and Krasser that any two points sets with the same cardinality and the same size convex hull can be triangulated in the "same" way, more precisely via \emphcompatible triangulations. We show counterexamples to various strengthened versions of this conjecture.
Dec 09 2016 cs.CV
This paper presents an efficient approach to image segmentation that approximates the piecewise-smooth (PS) functional in  with explicit solutions. By rendering some rational constraints on the initial conditions and the final solutions of the PS functional, we propose two novel formulations which can be approximated to be the explicit solutions of the evolution partial differential equations (PDEs) of the PS model, in which only one PDE needs to be solved efficiently. Furthermore, an energy term that regularizes the level set function to be a signed distance function is incorporated into our evolution formulation, and the time-consuming re-initialization is avoided. Experiments on synthetic and real images show that our method is more efficient than both the PS model and the local binary fitting (LBF) model , while having similar segmentation accuracy as the LBF model.
Dec 06 2016 cs.LG
Deep neural networks (DNNs) have proven to be quite effective in a vast array of machine learning tasks, with recent examples in cyber security and autonomous vehicles. Despite the superior performance of DNNs in these applications, it has been recently shown that these models are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points in order to "fool" the original DNN model, forcing it to mis-classify previously correctly classified samples with high confidence. Addressing this flaw in the model is essential if DNNs are to be used in critical applications such as those in cyber security. Previous work has provided various learning algorithms to enhance the robustness of DNN models, and they all fall into the tactic of "security through obscurity". This means security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate this issue shared across previous research work and propose a generic approach to escalate a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available. Our results indicate that our approach typically provides superior classification performance and resistance in comparison with state-of-art solutions.
A framework is presented for unsupervised learning of representations based on infomax principle for large-scale neural populations. We use an asymptotic approximation to the Shannon's mutual information for a large neural population to demonstrate that a good initial approximation to the global information-theoretic optimum can be obtained by a hierarchical infomax method. Starting from the initial solution, an efficient algorithm based on gradient descent of the final objective function is proposed to learn representations from the input datasets, and the method works for complete, overcomplete, and undercomplete bases. As confirmed by numerical experiments, our method is robust and highly efficient for extracting salient features from input datasets. Compared with the main existing methods, our algorithm has a distinct advantage in both the training speed and the robustness of unsupervised representation learning. Furthermore, the proposed method is easily extended to the supervised or unsupervised model for training deep structure networks.
While Shannon's mutual information has widespread applications in many disciplines, for practical applications it is often difficult to calculate its value accurately for high-dimensional variables because of the curse of dimensionality. This paper is focused on effective approximation methods for evaluating mutual information in the context of neural population coding. For large but finite neural populations, we derive several information-theoretic asymptotic bounds and approximation formulas that remain valid in high-dimensional spaces. We prove that optimizing the population density distribution based on these approximation formulas is a convex optimization problem which allows efficient numerical solutions. Numerical simulation results confirmed that our asymptotic formulas were highly accurate for approximating mutual information for large neural populations. In special cases, the approximation formulas are exactly equal to the true mutual information. We also discuss techniques of variable transformation and dimensionality reduction to facilitate computation of the approximations.
Nov 01 2016 cs.CV
In this paper, we present a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking. We describe a target object with histogram of oriented gradients and raw color features, of which each one is characterized by a set of Boolean maps generated by uniformly thresholding their values. The Boolean maps effectively encode multi-scale connectivity cues of the target with different granularities. The fine-grained Boolean maps capture spatially structural details that are effective for precise target localization while the coarse-grained ones encode global shape information that are robust to large target appearance variations. Finally, all the Boolean maps form together a robust representation that can be approximated by an explicit feature map of the intersection kernel, which is fed into a logistic regression classifier with online update, and the target location is estimated within a particle filter framework. The proposed representation scheme is computationally efficient and facilitates achieving favorable performance in terms of accuracy and robustness against the state-of-the-art tracking methods on a large benchmark dataset of 50 image sequences.
We study the problem of nonparametric dependence detection. Many existing methods suffer severe power loss due to non-uniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a novel binary expansion filtration approximation of the copula. Through a Hadamard-Walsh transform, we find that the cross interactions of binary variables in the filtration are complete sufficient statistics for dependence. These interactions are also uncorrelated under the null. By utilizing these interactions, the BET avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for specified power and (b) by providing clear interpretations of global and local relationships upon rejection of independence. The binary expansion approach also connects the test statistics with the current computing system to facilitate efficient bitwise implementation. We illustrate the BET by a study of the distribution of stars in the night sky and by an exploratory data analysis of the TCGA breast cancer data.
Oct 06 2016 cs.LG
Beyond its highly publicized victories in Go, there have been numerous successful applications of deep learning in information retrieval, computer vision and speech recognition. In cybersecurity, an increasing number of companies have become excited about the potential of deep learning, and have started to use it for various security incidents, the most popular being malware detection. These companies assert that deep learning (DL) could help turn the tide in the battle against malware infections. However, deep neural networks (DNNs) are vulnerable to adversarial samples, a flaw that plagues most if not all statistical learning models. Recent research has demonstrated that those with malicious intent can easily circumvent deep learning-powered malware detection by exploiting this flaw. In order to address this problem, previous work has developed various defense mechanisms that either augmenting training data or enhance model's complexity. However, after a thorough analysis of the fundamental flaw in DNNs, we discover that the effectiveness of current defenses is limited and, more importantly, cannot provide theoretical guarantees as to their robustness against adversarial sampled-based attacks. As such, we propose a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within samples. In this work, we evaluate our proposed technique against a real world dataset with 14,679 malware variants and 17,399 benign programs. We theoretically validate the robustness of our technique, and empirically show that our technique significantly boosts DNN robustness to adversarial samples while maintaining high accuracy in classification. To demonstrate the general applicability of our proposed method, we also conduct experiments using the MNIST and CIFAR-10 datasets, generally used in image recognition research.
Sep 13 2016 cs.CV
Recently deep neural networks based on tanh activation function have shown their impressive power in image denoising. In this letter, we try to use rectifier function instead of tanh and propose a dual-pathway rectifier neural network by combining two rectifier neurons with reversed input and output weights in the same hidden layer. We drive the equivalent activation function and compare it to some typical activation functions for image denoising under the same network architecture. The experimental results show that our model achieves superior performances faster especially when the noise is small.
Anomaly detection plays an important role in modern data-driven security applications, such as detecting suspicious access to a socket from a process. In many cases, such events can be described as a collection of categorical values that are considered as entities of different types, which we call heterogeneous categorical events. Due to the lack of intrinsic distance measures among entities, and the exponentially large event space, most existing work relies heavily on heuristics to calculate abnormal scores for events. Different from previous work, we propose a principled and unified probabilistic model APE (Anomaly detection via Probabilistic pairwise interaction and Entity embedding) that directly models the likelihood of events. In this model, we embed entities into a common latent space using their observed co-occurrence in different events. More specifically, we first model the compatibility of each pair of entities according to their embeddings. Then we utilize the weighted pairwise interactions of different entity types to define the event probability. Using Noise-Contrastive Estimation with "context-dependent" noise distribution, our model can be learned efficiently regardless of the large event space. Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques.
There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.
Aug 16 2016 cs.CV
Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance. In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. Specifically, residual learning and batch normalization are utilized to speed up the training process as well as boost the denoising performance. Different from the existing discriminative denoising models which usually train a specific model for additive white Gaussian noise (AWGN) at a certain noise level, our DnCNN model is able to handle Gaussian denoising with unknown noise level (i.e., blind Gaussian denoising). With the residual learning strategy, DnCNN implicitly removes the latent clean image in the hidden layers. This property motivates us to train a single DnCNN model to tackle with several general image denoising tasks such as Gaussian denoising, single image super-resolution and JPEG image deblocking. Our extensive experiments demonstrate that our DnCNN model can not only exhibit high effectiveness in several general image denoising tasks, but also be efficiently implemented by benefiting from GPU computing.
Aug 10 2016 cs.CR
Intrusion detection system (IDS) is an important part of enterprise security system architecture. In particular, anomaly-based IDS has been widely applied to detect abnormal process behaviors that deviate from the majority. However, such abnormal behavior usually consists of a series of low-level heterogeneous events. The gap between the low-level events and the high-level abnormal behaviors makes it hard to infer which single events are related to the real abnormal activities, especially considering that there are massive "noisy" low-level events happening in between. Hence, the existing work that focus on detecting single entities/events can hardly achieve high detection accuracy. Different from previous work, we design and implement GID, an efficient graph-based intrusion detection technique that can identify abnormal event sequences from a massive heterogeneous process traces with high accuracy. GID first builds a compact graph structure to capture the interactions between different system entities. The suspiciousness or anomaly score of process paths is then measured by leveraging random walk technique to the constructed acyclic directed graph. To eliminate the score bias from the path length, the Box-Cox power transformation based approach is introduced to normalize the anomaly scores so that the scores of paths of different lengths have the same distribution. The efficiency of suspicious path discovery is further improved by the proposed optimization scheme. We fully implement our GID algorithm and deploy it into a real enterprise security system, and it greatly helps detect the advanced threats, and optimize the incident response. Executing GID on system monitoring datasets showing that GID is efficient (about 2 million records per minute) and accurate (higher than 80% in terms of detection rate).
Aug 10 2016 cs.CV
A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping. In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set.
Jul 26 2016 cs.LG
Matrix sketching is aimed at finding close approximations of a matrix by factors of much smaller dimensions, which has important applications in optimization and machine learning. Given a matrix A of size m by n, state-of-the-art randomized algorithms take O(m * n) time and space to obtain its low-rank decomposition. Although quite useful, the need to store or manipulate the entire matrix makes it a computational bottleneck for truly large and dense inputs. Can we sketch an m-by-n matrix in O(m + n) cost by accessing only a small fraction of its rows and columns, without knowing anything about the remaining data? In this paper, we propose the cascaded bilateral sampling (CABS) framework to solve this problem. We start from demonstrating how the approximation quality of bilateral matrix sketching depends on the encoding powers of sampling. In particular, the sampled rows and columns should correspond to the code-vectors in the ground truth decompositions. Motivated by this analysis, we propose to first generate a pilot-sketch using simple random sampling, and then pursue more advanced, "follow-up" sampling on the pilot-sketch factors seeking maximal encoding powers. In this cascading process, the rise of approximation quality is shown to be lower-bounded by the improvement of encoding powers in the follow-up sampling step, thus theoretically guarantees the algorithmic boosting property. Computationally, our framework only takes linear time and space, and at the same time its performance rivals the quality of state-of-the-art algorithms consuming a quadratic amount of resources. Empirical evaluations on benchmark data fully demonstrate the potential of our methods in large scale matrix sketching and related areas.
We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the problem as a structured prediction problem on sequential data, our main idea is to use Long Short-Term Memory (LSTM), a special type of recurrent neural networks to model the variable-range dependencies entailed in the task of video summarization. Our learning models attain the state-of-the-art results on two benchmark video datasets. Detailed analysis justifies the design of the models. In particular, we show that it is crucial to take into consideration the sequential structures in videos and model them. Besides advances in modeling techniques, we introduce techniques to address the need of a large number of annotated data for training complex learning models. There, our main idea is to exploit the existence of auxiliary annotated video datasets, albeit heterogeneous in visual styles and contents. Specifically, we show domain adaptation techniques can improve summarization by reducing the discrepancies in statistical properties across those datasets.
May 24 2016 cs.CR
In this paper, we present that security threats coming with existing GPU memory management strategy are overlooked, which opens a back door for adversaries to freely break the memory isolation: they enable adversaries without any privilege in a computer to recover the raw memory data left by previous processes directly. More importantly, such attacks can work on not only normal multi-user operating systems, but also cloud computing platforms. To demonstrate the seriousness of such attacks, we recovered original data directly from GPU memory residues left by exited commodity applications, including Google Chrome, Adobe Reader, GIMP, Matlab. The results show that, because of the vulnerable memory management strategy, commodity applications in our experiments are all affected.
Tensor factorization is a powerful tool to analyse multi-way data. Compared with traditional multi-linear methods, nonlinear tensor factorization models are capable of capturing more complex relationships in the data. However, they are computationally expensive and may suffer severe learning bias in case of extreme data sparsity. To overcome these limitations, in this paper we propose a distributed, flexible nonlinear tensor factorization model. Our model can effectively avoid the expensive computations and structural restrictions of the Kronecker-product in existing TGP formulations, allowing an arbitrary subset of tensorial entries to be selected to contribute to the training. At the same time, we derive a tractable and tight variational evidence lower bound (ELBO) that enables highly decoupled, parallel computations and high-quality inference. Based on the new bound, we develop a distributed inference algorithm in the MapReduce framework, which is key-value-free and can fully exploit the memory cache mechanism in fast MapReduce systems such as SPARK. Experimental results fully demonstrate the advantages of our method over several state-of-the-art approaches, in terms of both predictive performance and computational efficiency. Moreover, our approach shows a promising potential in the application of Click-Through-Rate (CTR) prediction for online advertising.
Apr 12 2016 cs.CV
Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this paper, we propose a deep cascaded multi-task framework which exploits the inherent correlation between them to boost up their performance. In particular, our framework adopts a cascaded structure with three stages of carefully designed deep convolutional networks that predict face and landmark location in a coarse-to-fine manner. In addition, in the learning process, we propose a new online hard sample mining strategy that can improve the performance automatically without manual sample selection. Our method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmark for face detection, and AFLW benchmark for face alignment, while keeps real time performance.
Learning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exists between such processes captured by the support of the excitation matrix. We show that the resulting causal influence network is equivalent to the Directed Information graph (DIG) of the processes, which encodes the causal factorization of the joint distribution of the processes. Furthermore, we present an algorithm for learning the support of excitation matrix (or equivalently the DIG). The performance of the algorithm is evaluated on synthesized multivariate Hawkes networks as well as a stock market and MemeTracker real-world dataset.
Mar 11 2016 cs.CV
Video summarization has unprecedented importance to help us digest, browse, and search today's ever-growing video collections. We propose a novel subset selection technique that leverages supervision in the form of human-created summaries to perform automatic keyframe-based video summarization. The main idea is to nonparametrically transfer summary structures from annotated videos to unseen test videos. We show how to extend our method to exploit semantic side information about the video's category/genre to guide the transfer process by those training videos semantically consistent with the test input. We also show how to generalize our method to subshot-based summarization, which not only reduces computational costs but also provides more flexible ways of defining visual similarity across subshots spanning several frames. We conduct extensive evaluation on several benchmarks and demonstrate promising results, outperforming existing methods in several settings.
We study the spherical cap packing problem with a probabilistic approach. Such probabilistic considerations result in an asymptotic sharp universal uniform bound on the maximal inner product between any set of unit vectors and a stochastically independent uniformly distributed unit vector. When the set of unit vectors are themselves independently uniformly distributed, we further develop the extreme value distribution limit of the maximal inner product, which characterizes its uncertainty around the bound. As applications of the above asymptotic results, we derive (1) an asymptotic sharp universal uniform bound on the maximal spurious correlation, as well as its uniform convergence in distribution when the explanatory variables are independently Gaussian distributed; and (2) an asymptotic sharp universal bound on the maximum norm of a low-rank elliptically distributed vector, as well as related limiting distributions. With these results, we develop a fast detection method for a low-rank structure in high-dimensional Gaussian data without using the spectrum information.
It is commonplace to encounter nonstationary data, of which the underlying generating process may change over time or across domains. The nonstationarity presents both challenges and opportunities for causal discovery. In this paper we propose a principled framework to handle nonstationarity, and develop some methods to address three important questions. First, we propose an enhanced constraint-based method to detect variables whose local mechanisms are nonstationary and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine some causal directions by taking advantage of information carried by changing distributions. Third, we develop a method for visualizing the nonstationarity of causal modules. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.
A fog computing based radio access network (F-RAN) is presented in this article as a promising paradigm for the fifth generation (5G) wireless communication system to provide high spectral and energy efficiency. The core idea is to take full advantages of local radio signal processing, cooperative radio resource management, and distributed storing capabilities in edge devices, which can decrease the heavy burden on fronthaul and avoid large-scale radio signal processing in the centralized baseband unit pool. This article comprehensively presents the system architecture and key techniques of F-RANs. In particular, key techniques and their corresponding solutions, including transmission mode selection and interference suppression, are discussed. Open issues in terms of edge caching, software-defined networking, and network function virtualization, are also identified.
Recent developments in structural equation modeling have produced several methods that can usually distinguish cause from effect in the two-variable case. For that purpose, however, one has to impose substantial structural constraints or smoothness assumptions on the functional causal models. In this paper, we consider the problem of determining the causal direction from a related but different point of view, and propose a new framework for causal direction determination. We show that it is possible to perform causal inference based on the condition that the cause is "exogenous" for the parameters involved in the generating process from the cause to the effect. In this way, we avoid the structural constraints required by the SEM-based approaches. In particular, we exploit nonparametric methods to estimate marginal and conditional distributions, and propose a bootstrap-based approach to test for the exogeneity condition; the testing results indicate the causal direction between two variables. The proposed method is validated on both synthetic and real data.
Apr 22 2015 cs.CV
Recently, the compressive tracking (CT) method has attracted much attention due to its high efficiency, but it cannot well deal with the large scale target appearance variations due to its data-independent random projection matrix that results in less discriminative features. To address this issue, in this paper we propose an adaptive CT approach, which selects the most discriminative features to design an effective appearance model. Our method significantly improves CT in three aspects: Firstly, the most discriminative features are selected via an online vector boosting method. Secondly, the object representation is updated in an effective online manner, which preserves the stable features while filtering out the noisy ones. Finally, a simple and effective trajectory rectification approach is adopted that can make the estimated location more accurate. Extensive experiments on the CVPR2013 tracking benchmark demonstrate the superior performance of our algorithm compared over state-of-the-art tracking algorithms.
Apr 10 2015 cs.SI
The proliferation of mobile handheld devices in combination with the technological advancements in mobile computing has led to a number of innovative services that make use of the location information available on such devices. Traditional yellow pages websites have now moved to mobile platforms, giving the opportunity to local businesses and potential, near-by, customers to connect. These platforms can offer an affordable advertisement channel to local businesses. One of the mechanisms offered by location-based social networks (LBSNs) allows businesses to provide special offers to their customers that connect through the platform. We collect a large time-series dataset from approximately 14 million venues on Foursquare and analyze the performance of such campaigns using randomization techniques and (non-parametric) hypothesis testing with statistical bootstrapping. Our main finding indicates that this type of promotions are not as effective as anecdote success stories might suggest. Finally, we design classifiers by extracting three different types of features that are able to provide an educated decision on whether a special offer campaign for a local business will succeed or not both in short and long term.
Mar 27 2015 cs.NI
Increasing sources of sensor measurements and prior knowledge have become available for indoor localization on smartphones. How to effectively utilize these sources for enhancing localization accuracy is an important yet challenging problem. In this paper, we present an area state-aided localization algorithm that exploits various sources of information. Specifically, we introduce the concept of area state to indicate the area where the user is on an indoor map. The position of the user is then estimated using inertial measurement unit (IMU) measurements with the aid of area states. The area states are in turn updated based on the position estimates. To avoid accumulated errors of IMU measurements, our algorithm uses WiFi received signal strength indicator (RSSI) to indicate the vicinity of the user to the routers. The experiment results show that our system can achieve satisfactory localization accuracy in a typical indoor environment.
Jan 20 2015 cs.CV
Deep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper we present that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to develop a robust representation for visual tracking. In the first frame, we employ the k-means algorithm to extract a set of normalized patches from the target region as fixed filters, which integrate a series of adaptive contextual filters surrounding the target to define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and the useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps form together a global representation, which is built on mid-level features, thereby remaining close to image-level information, and hence the inner geometric layout of the target is also well preserved. A simple soft shrinkage method with an adaptive threshold is employed to de-noise the global representation, resulting in a robust sparse representation. The representation is updated via a simple and effective online strategy, allowing it to robustly adapt to target appearance variations. Our convolution networks have surprisingly lightweight structure, yet perform favorably against several state-of-the-art methods on the CVPR2013 tracking benchmark dataset with 50 challenging videos.
Dec 30 2014 cs.DS
The primary structure of a ribonucleic acid (RNA) molecule can be represented as a sequence of nucleotides (bases) over the alphabet A, C, G, U. The secondary or tertiary structure of an RNA is a set of base pairs which form bonds between A-U and G-C. For secondary structures, these bonds have been traditionally assumed to be one-to-one and non-crossing. This paper considers pattern matching as well as local alignment between two RNA structures. For pattern matching, we present two algorithms, one for obtaining an exact match, the other for approximate match. We then present an algorithm for RNA local structural alignment.
Taking full advantages of both heterogeneous networks (HetNets) and cloud access radio access networks (CRANs), heterogeneous cloud radio access networks (H-CRANs) are presented to enhance both the spectral and energy efficiencies, where remote radio heads (RRHs) are mainly used to provide high data rates for users with high quality of service (QoS) requirements, while the high power node (HPN) is deployed to guarantee the seamless coverage and serve users with low QoS requirements. To mitigate the inter-tier interference and improve EE performances in H-CRANs, characterizing user association with RRH/HPN is considered in this paper, and the traditional soft fractional frequency reuse (S-FFR) is enhanced. Based on the RRH/HPN association constraint and the enhanced S-FFR, an energy-efficient optimization problem with the resource assignment and power allocation for the orthogonal frequency division multiple access (OFDMA) based H-CRANs is formulated as a non-convex objective function. To deal with the non-convexity, an equivalent convex feasibility problem is reformulated, and closedform expressions for the energy-efficient resource allocation solution to jointly allocate the resource block and transmit power are derived by the Lagrange dual decomposition method. Simulation results confirm that the H-CRAN architecture and the corresponding resource allocation solution can enhance the energy efficiency significantly.
Dec 01 2014 cs.NI
Mobile sensing has become a promising paradigm for mobile users to obtain information by task crowdsourcing. However, due to the social preferences of mobile users, the quality of sensing reports may be impacted by the underlying social attributes and selfishness of individuals. Therefore, it is crucial to consider the social impacts and trustworthiness of mobile users when selecting task participants in mobile sensing. In this paper, we propose a Social Aware Crowdsourcing with Reputation Management (SACRM) scheme to select the well-suited participants and allocate the task rewards in mobile sensing. Specifically, we consider the social attributes, task delay and reputation in crowdsourcing and propose a participant selection scheme to choose the well-suited participants for the sensing task under a fixed task budget. A report assessment and rewarding scheme is also introduced to measure the quality of the sensing reports and allocate the task rewards based the assessed report quality. In addition, we develop a reputation management scheme to evaluate the trustworthiness and cost performance ratio of mobile users for participant selection. Theoretical analysis and extensive simulations demonstrate that SACRM can efficiently improve the crowdsourcing utility and effectively stimulate the participants to improve the quality of their sensing reports.
The very original concept of cognitive radio (CR) raised by Mitola targets at all the environment parameters, including those in physical layer, MAC layer, application layer as well as the information extracted from reasoning. Hence the first CR is also referred to as "full cognitive radio". However, due to its difficult implementation, FCC and Simon Haykin separately proposed a much more simplified definition, in which CR mainly detects one single parameter, i.e., spectrum occupancy, and is also called as "spectrum sensing cognitive radio". With the rapid development of wireless communication, the infrastructure of a wireless system becomes much more complicated while the functionality at every node is desired to be as intelligent as possible, say the self-organized capability in the approaching 5G cellular networks. It is then interesting to re-look into Mitola's definition and think whether one could, besides obtaining the "on/off" status of the licensed user only, achieve more parameters in a cognitive way. In this article, we propose a new cognitive architecture targeting at multiple parameters in future cellular networks, which is a one step further towards the "full cognition" compared to the most existing CR research. The new architecture is elaborated in detailed stages, and three representative examples are provided based on the recent research progress to illustrate the feasibility as well as the validity of the proposed architecture.
In this paper, we discuss the method of Bayesian regression and its efficacy for predicting price variation of Bitcoin, a recently popularized virtual, cryptographic currency. Bayesian regression refers to utilizing empirical data as proxy to perform Bayesian inference. We utilize Bayesian regression for the so-called "latent source model". The Bayesian regression for "latent source model" was introduced and discussed by Chen, Nikolov and Shah (2013) and Bresler, Chen and Shah (2014) for the purpose of binary classification. They established theoretical as well as empirical efficacy of the method for the setting of binary classification. In this paper, instead we utilize it for predicting real-valued quantity, the price of Bitcoin. Based on this price prediction method, we devise a simple strategy for trading Bitcoin. The strategy is able to nearly double the investment in less than 60 day period when run against real data trace.