Nov 21 2017 cs.CV
Imaging objects that are obscured by scattering and occlusion is an important challenge for many applications. For example, navigation and mapping capabilities of autonomous vehicles could be improved, vision in harsh weather conditions or under water could be facilitated, or search and rescue scenarios could become more effective. Unfortunately, conventional cameras cannot see around corners. Emerging, time-resolved computational imaging systems, however, have demonstrated first steps towards non-line-of-sight (NLOS) imaging. In this paper, we develop an algorithmic framework for NLOS imaging that is robust to partial occlusions within the hidden scenes. This is a common light transport effect, but not adequately handled by existing NLOS reconstruction algorithms, resulting in fundamental limitations in what types of scenes can be recovered. We demonstrate state-of-the-art NLOS reconstructions in simulation and with a prototype single photon avalanche diode (SPAD) based acquisition system.
Oct 12 2017 cs.CV
Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising. However, these methods mostly learn a specific model for each noise level, and require multiple models for denoising images with different noise levels. They also lack flexibility to deal with spatially variant noise, limiting their applications in practical denoising. To address these issues, we present a fast and flexible denoising convolutional neural network, namely FFDNet, with a tunable noise level map as the input. The proposed FFDNet works on downsampled sub-images to speed up the inference, and adopts orthogonal regularization to enhance the generalization ability. In contrast to the existing discriminative denoisers, FFDNet enjoys several desirable properties, including (i) the ability to handle a wide range of noise levels (i.e., [0, 75]) effectively with a single network, (ii) the ability to remove spatially variant noise by specifying a non-uniform noise level map, and (iii) faster speed than benchmark BM3D even on CPU without sacrificing denoising performance. Extensive experiments on synthetic and real noisy images are conducted to evaluate FFDNet in comparison with state-of-the-art denoisers. The results show that FFDNet is effective and efficient, making it highly attractive for practical denoising applications.
Oct 10 2017 cs.CV
Automatically predicting age group and gender from face images acquired in unconstrained conditions is an important and challenging task in many real-world applications. Nevertheless, the conventional methods with manually-designed features on in-the-wild benchmarks are unsatisfactory because of incompetency to tackle large variations in unconstrained images. This difficulty is alleviated to some degree through Convolutional Neural Networks (CNN) for its powerful feature representation. In this paper, we propose a new CNN based method for age group and gender estimation leveraging Residual Networks of Residual Networks (RoR), which exhibits better optimization ability for age group and gender classification than other CNN architectures.Moreover, two modest mechanisms based on observation of the characteristics of age group are presented to further improve the performance of age estimation.In order to further improve the performance and alleviate over-fitting problem, RoR model is pre-trained on ImageNet firstly, and then it is fune-tuned on the IMDB-WIKI-101 data set for further learning the features of face images, finally, it is used to fine-tune on Adience data set. Our experiments illustrate the effectiveness of RoR method for age and gender estimation in the wild, where it achieves better performance than other CNN methods. Finally, the RoR-152+IMDB-WIKI-101 with two mechanisms achieves new state-of-the-art results on Adience benchmark.
Oct 03 2017 cs.CV
The Residual Networks of Residual Networks (RoR) exhibits excellent performance in the image classification task, but sharply increasing the number of feature map channels makes the characteristic information transmission incoherent, which losses a certain of information related to classification prediction, limiting the classification performance. In this paper, a Pyramidal RoR network model is proposed by analysing the performance characteristics of RoR and combining with the PyramidNet. Firstly, based on RoR, the Pyramidal RoR network model with channels gradually increasing is designed. Secondly, we analysed the effect of different residual block structures on performance, and chosen the residual block structure which best favoured the classification performance. Finally, we add an important principle to further optimize Pyramidal RoR networks, drop-path is used to avoid over-fitting and save training time. In this paper, image classification experiments were performed on CIFAR-10/100 and SVHN datasets, and we achieved the current lowest classification error rates were 2.96%, 16.40% and 1.59%, respectively. Experiments show that the Pyramidal RoR network optimization method can improve the network performance for different data sets and effectively suppress the gradient disappearance problem in DCNN training.
Oct 02 2017 cs.LG
Rule extraction from black-box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly non-linear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second order recurrent neural networks trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases the rules outperform the trained RNNs.
Understanding human mobility is critical for decision support in areas from urban planning to infectious diseases control. Prior work has focused on tracking daily logs of outdoor mobility without considering relevant context, which contain a mixture of regular and irregular human movement for a range of purposes, and thus diverse effects on the dynamics have been ignored. This study aims to focus on irregular human movement of different meta-populations with various purposes. We propose approaches to estimate the predictability of mobility in different contexts. With our survey data from international and domestic visitors to Australia, we found that the travel patterns of Europeans visiting for holidays are less predictable than those visiting for education, while East Asian visitors show the opposite patterns, ie, more predictable for holidays than for education. Domestic residents from the most populous Australian states exhibit the most unpredictable patterns, while visitors from less populated states show the highest predictable movement.
Device-to-device (D2D) communications recently have attracted much attention for its potential capability to improve spectral efficiency underlaying the existing heterogeneous networks (HetNets). Due to no sophisticated control, D2D user equipments (DUEs) themselves cannot resist eavesdropping or security attacks. It is urgent to maximize the secure capacity for both cellular users and DUEs. This paper formulates the radio resource allocation problem to maximize the secure capacity of DUEs for the D2D communication underlaying HetNets which consist of high power nodes and low power nodes. The optimization objective function with transmit bit rate and power constraints, which is non-convex and hard to be directly derived, is firstly transformed into matrix form. Then the equivalent convex form of the optimization problem is derived according to the Perron-Frobenius theory. A heuristic iterative algorithm based on the proximal theory is proposed to solve this equivalent convex problem through evaluating the proximal operator of Lagrange function. Numerical results show that the proposed radio resource allocation solution significantly improves the secure capacity with a fast convergence speed.
As a promising paradigm for the fifth generation wireless communication (5G) system, the fog radio access network (F-RAN) has been proposed as an advanced socially-aware mobile networking architecture to provide high spectral efficiency (SE) while maintaining high energy efficiency (EE) and low latency. Recent advents are advocated to the performance analysis and radio resource allocation, both of which are fundamental issues to make F-RANs successfully rollout. This article comprehensively summarizes the recent advances of the performance analysis and radio resource allocation in F-RANs. Particularly, the advanced edge cache and adaptive model selection schemes are presented to improve SE and EE under maintaining a low latency level. The radio resource allocation strategies to optimize SE and EE in F-RANs are respectively proposed. A few open issues in terms of the F-RAN based 5G architecture and the social-awareness technique are identified as well.
Negative afterimage appears in our vision when we shift our gaze from an over stimulated original image to a new area with a uniform color. The colors of negative afterimages differ from the old stimulating colors in the original image when the color in the new area is either neutral or chromatic. The interaction between stimulating colors in the test and inducing field in the original image changes our color perception due to simultaneous contrast, and the interaction between changed colors perceived in the previously-viewed field and the color in the currently-viewed field also affects our perception of colors in negative afterimages due to successive contrast. Based on these observations we propose a computational model to estimate colors of negative afterimages in more general cases where the original stimulating color in the test field is chromatic, and the original stimulating color in the inducing field and the new stimulating color can be either neutral or chromatic. We validate our model with human experiments.
In this paper, we investigate multi-message authentication to combat adversaries with infinite computational capacity. An authentication framework over a wiretap channel $(W_1,W_2)$ is proposed to achieve information-theoretic security with the same key. The proposed framework bridges the two research areas in physical (PHY) layer security: secure transmission and message authentication. Specifically, the sender Alice first transmits message $M$ to the receiver Bob over $(W_1,W_2)$ with an error correction code; then Alice employs a hash function (i.e., $\varepsilon$-AWU$_2$ hash functions) to generate a message tag $S$ of message $M$ using key $K$, and encodes $S$ to a codeword $X^n$ by leveraging an existing strongly secure channel coding with exponentially small (in code length $n$) average probability of error; finally, Alice sends $X^n$ over $(W_1,W_2)$ to Bob who authenticates the received messages. We develop a theorem regarding the requirements/conditions for the authentication framework to be information-theoretic secure for authenticating a polynomial number of messages in terms of $n$. Based on this theorem, we propose an authentication protocol that can guarantee the security requirements, and prove its authentication rate can approach infinity when $n$ goes to infinity. Furthermore, we design and implement an efficient and feasible authentication protocol over binary symmetric wiretap channel (BSWC) by using \emphLinear Feedback Shifting Register based (LFSR-based) hash functions and strong secure polar code. Through extensive experiments, it is demonstrated that the proposed protocol can achieve low time cost, high authentication rate, and low authentication error rate.
Jul 20 2017 cs.CV
Identity transformations, used as skip-connections in residual networks, directly connect convolutional layers close to the input and those close to the output in deep neural networks, improving information flow and thus easing the training. In this paper, we introduce two alternative linear transforms, orthogonal transformation and idempotent transformation. According to the definition and property of orthogonal and idempotent matrices, the product of multiple orthogonal (same idempotent) matrices, used to form linear transformations, is equal to a single orthogonal (idempotent) matrix, resulting in that information flow is improved and the training is eased. One interesting point is that the success essentially stems from feature reuse and gradient reuse in forward and backward propagation for maintaining the information during flow and eliminating the gradient vanishing problem because of the express way through skip-connections. We empirically demonstrate the effectiveness of the proposed two transformations: similar performance in single-branch networks and even superior in multi-branch networks in comparison to identity transformations.
While autoencoders are a key technique in representation learning for continuous structures, such as images or wave forms, developing general-purpose autoencoders for discrete structures, such as text sequence or discretized images, has proven to be more challenging. In particular, discrete inputs make it more difficult to learn a smooth encoder that preserves the complex local relationships in the input space. In this work, we propose an adversarially regularized autoencoder (ARAE) with the goal of learning more robust discrete-space representations. ARAE jointly trains both a rich discrete-space encoder, such as an RNN, and a simpler continuous space generator function, while using generative adversarial network (GAN) training to constrain the distributions to be similar. This method yields a smoother contracted code space that maps similar inputs to nearby codes, and also an implicit latent variable GAN model for generation. Experiments on text and discretized images demonstrate that the GAN model produces clean interpolations and captures the multimodality of the original space, and that the autoencoder produces improve- ments in semi-supervised learning as well as state-of-the-art results in unaligned text style transfer task using only a shared continuous-space representation.
Measurement error in the observed values of the variables can greatly change the output of various causal discovery methods. This problem has received much attention in multiple fields, but it is not clear to what extent the causal model for the measurement-error-free variables can be identified in the presence of measurement error with unknown variance. In this paper, we study precise sufficient identifiability conditions for the measurement-error-free causal model and show what information of the causal model can be recovered from observed data. In particular, we present two different sets of identifiability conditions, based on the second-order statistics and higher-order statistics of the data, respectively. The former was inspired by the relationship between the generating model of the measurement-error-contaminated data and the factor analysis model, and the latter makes use of the identifiability result of the over-complete independent component analysis problem.
We study causal inference in a multi-environment setting, in which the functional relations for producing the variables from their direct causes remain the same across environments, while the distribution of exogenous noises may vary. We introduce the idea of using the invariance of the functional relations of the variables to their causes across a set of environments. We define a notion of completeness for a causal inference algorithm in this setting and prove the existence of such algorithm by proposing the baseline algorithm. Additionally, we present an alternate algorithm that has significantly improved computational and sample complexity compared to the baseline algorithm. The experiment results show that the proposed algorithm outperforms the other existing algorithms.
It is oftentimes impossible to understand how machine learning models reach a decision. While recent research has proposed various technical approaches to provide some clues as to how a learning model makes individual decisions, they cannot provide users with ability to inspect a learning model as a complete entity. In this work, we propose a new technical approach that augments a Bayesian regression mixture model with multiple elastic nets. Using the enhanced mixture model, we extract explanations for a target model through global approximation. To demonstrate the utility of our approach, we evaluate it on different learning models covering the tasks of text mining and image recognition. Our results indicate that the proposed approach not only outperforms the state-of-the-art technique in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of a learning model.
In the coded caching framework proposed by Maddah Ali and Niesen, there are two classes of coding schemes known in the literature, namely uncoded prefetching schemes and coded prefetching schemes. In this work, we provide a connection between the uncoded prefetching scheme proposed by Maddah Ali and Niesen (and its improved version by Yu et al.) and the coded prefetching scheme proposed by Tian and Chen, when the number of files is no larger than that of users. We make a critical observation that a coding component in the Tian-Chen scheme can be replaced by a binary code, which enables us to view the two schemes as the extremes of a more general scheme. The intermediate operating points of this general scheme can in fact provide new tradeoff points previously not known in the literature, however, explicit characterizing the performance of this general scheme appears rather difficult.
Apr 12 2017 cs.CV
Model-based optimization methods and discriminative learning methods have been the two dominant strategies for solving various inverse problems in low-level vision. Typically, those two kinds of methods have their respective merits and drawbacks, e.g., model-based optimization methods are flexible for handling different inverse problems but are usually time-consuming with sophisticated priors for the purpose of good performance; in the meanwhile, discriminative learning methods have fast testing speed but their application range is greatly restricted by the specialized task. Recent works have revealed that, with the aid of variable splitting techniques, denoiser prior can be plugged in as a modular part of model-based optimization methods to solve other inverse problems (e.g., deblurring). Such an integration induces considerable advantage when the denoiser is obtained via discriminative learning. However, the study of integration with fast discriminative denoiser prior is still lacking. To this end, this paper aims to train a set of fast and effective CNN (convolutional neural network) denoisers and integrate them into model-based optimization method to solve other inverse problems. Experimental results demonstrate that the learned set of denoisers not only achieve promising Gaussian denoising results but also can be used as prior to deliver good performance for various low-level vision applications.
Measuring conditional dependencies among the variables of a network is of great interest to many disciplines. This paper studies some shortcomings of the existing dependency measures in detecting direct causal influences or their lack of ability for group selection to capture strong dependencies and accordingly introduces a new statistical dependency measure to overcome them. This measure is inspired by Dobrushin's coefficients and based on the fact that there is no dependency between $X$ and $Y$ given another variable $Z$, if and only if the conditional distribution of $Y$ given $X=x$ and $Z=z$ does not change when $X$ takes another realization $x'$ while $Z$ takes the same realization $z$. We show the advantages of this measure over the related measures in the literature. Moreover, we establish the connection between our measure and the integral probability metric (IPM) that helps to develop estimators of the measure with lower complexity compared to other relevant information theoretic based measures. Finally, we show the performance of this measure through numerical simulations.
Mar 30 2017 cs.CR
Inspired by the boom of the consumer IoT market, many device manufacturers, start-up companies and technology giants have jumped into the space. Unfortunately, the exciting utility and rapid marketization of IoT, come at the expense of privacy and security. Industry reports and academic work have revealed many attacks on IoT systems, resulting in privacy leakage, property loss and large-scale availability problems. To mitigate such threats, a few solutions have been proposed. However, it is still less clear what are the impacts they can have on the IoT ecosystem. In this work, we aim to perform a comprehensive study on reported attacks and defenses in the realm of IoT aiming to find out what we know, where the current studies fall short and how to move forward. To this end, we first build a toolkit that searches through massive amount of online data using semantic analysis to identify over 3000 IoT-related articles. Further, by clustering such collected data using machine learning technologies, we are able to compare academic views with the findings from industry and other sources, in an attempt to understand the gaps between them, the trend of the IoT security risks and new problems that need further attention. We systemize this process, by proposing a taxonomy for the IoT ecosystem and organizing IoT security into five problem areas. We use this taxonomy as a beacon to assess each IoT work across a number of properties we define. Our assessment reveals that relevant security and privacy problems are far from solved. We discuss how each proposed solution can be applied to a problem area and highlight their strengths, assumptions and constraints. We stress the need for a security framework for IoT vendors and discuss the trend of shifting security liability to external or centralized entities. We also identify open research problems and provide suggestions towards a secure IoT ecosystem.
Mar 27 2017 cs.CV
Recent years have witnessed great success of convolutional neural network (CNN) for various problems both in low and high level visions. Especially noteworthy is the residual network which was originally proposed to handle high-level vision problems and enjoys several merits. This paper aims to extend the merits of residual network, such as skip connection induced fast training, for a typical low-level vision problem, i.e., single image super-resolution. In general, the two main challenges of existing deep CNN for supper-resolution lie in the gradient exploding/vanishing problem and large amount of parameters or computational cost as CNN goes deeper. Correspondingly, the skip connections or identity mapping shortcuts are utilized to avoid gradient exploding/vanishing problem. To tackle with the second problem, a parameter economic CNN architecture which has carefully designed width, depth and skip connections was proposed. Different residual-like architectures for image superresolution has also been compared. Experimental results have demonstrated that the proposed CNN model can not only achieve state-of-the-art PSNR and SSIM results for single image super-resolution but also produce visually pleasant results. This paper has extended the mmm 2017 paper with more experiments and explanations.
Mar 07 2017 cs.CL
Computation of semantic similarity between concepts is an important foundation for many research works. This paper focuses on IC computing methods and IC measures, which estimate the semantic similarities between concepts by exploiting the topological parameters of the taxonomy. Based on analyzing representative IC computing methods and typical semantic similarity measures, we propose a new hybrid IC computing method. Through adopting the parameter dhyp and lch, we utilize the new IC computing method and propose a novel comprehensive measure of semantic similarity between concepts. An experiment based on WordNet "is a" taxonomy has been designed to test representative measures and our measure on benchmark dataset R&G, and the results show that our measure can obviously improve the similarity accuracy. We evaluate the proposed approach by comparing the correlation coefficients between five measures and the artificial data. The results show that our proposal outperforms the previous measures.
We study the problem of learning the support of transition matrix between random processes in a Vector Autoregressive (VAR) model from samples when a subset of the processes are latent. It is well known that ignoring the effect of the latent processes may lead to very different estimates of the influences among observed processes, and we are concerned with identifying the influences among the observed processes, those between the latent ones, and those from the latent to the observed ones. We show that the support of transition matrix among the observed processes and lengths of all latent paths between any two observed processes can be identified successfully under some conditions on the VAR model. From the lengths of latent paths, we reconstruct the latent subgraph (representing the influences among the latent processes) with a minimum number of variables uniquely if its topology is a directed tree. Furthermore, we propose an algorithm that finds all possible minimal latent graphs under some conditions on the lengths of latent paths. Our results apply to both non-Gaussian and Gaussian cases, and experimental results on various synthetic and real-world datasets validate our theoretical results.
We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemble what have been found in the visual cortex. Our result suggests that the diversity or heterogeneity of tuning curve shapes as observed in neurophysiological experiment might actually constitute an optimal population representation of visual motions with multiple components.
Dec 28 2016 cs.CV
Video object segmentation is challenging due to the factors like rapidly fast motion, cluttered backgrounds, arbitrary object appearance variation and shape deformation. Most existing methods only explore appearance information between two consecutive frames, which do not make full use of the usefully long-term nonlocal information that is helpful to make the learned appearance stable, and hence they tend to fail when the targets suffer from large viewpoint changes and significant non-rigid deformations. In this paper, we propose a simple yet effective approach to mine the long-term sptatio-temporally nonlocal appearance information for unsupervised video segmentation. The motivation of our algorithm comes from the spatio-temporal nonlocality of the region appearance reoccurrence in a video. Specifically, we first generate a set of superpixels to represent the foreground and background, and then update the appearance of each superpixel with its long-term sptatio-temporally nonlocal counterparts generated by the approximate nearest neighbor search method with the efficient KD-tree algorithm. Then, with the updated appearances, we formulate a spatio-temporal graphical model comprised of the superpixel label consistency potentials. Finally, we generate the segmentation by optimizing the graphical model via iteratively updating the appearance model and estimating the labels. Extensive evaluations on the SegTrack and Youtube-Objects datasets demonstrate the effectiveness of the proposed method, which performs favorably against some state-of-art methods.
Dec 23 2016 cs.CR
Given a large number of low-level heterogeneous categorical alerts from an anomaly detection system, how to characterize complex relationships between different alerts, filter out false positives, and deliver trustworthy rankings and suggestions to end users? This problem is motivated by and generalized from applications in enterprise security and attack scenario reconstruction. While existing techniques focus on either reconstructing abnormal scenarios or filtering out false positive alerts, it can be more advantageous to consider the two perspectives simultaneously in order to improve detection accuracy and better understand anomaly behaviors. In this paper, we propose CAR, a collaborative alerts ranking framework that exploits both temporal and content correlations from heterogeneous categorical alerts. CAR first builds a tree-based model to capture both short-term correlations and long-term dependencies in each alert sequence, which identifies abnormal action sequences. Then, an embedding-based model is employed to learn the content correlations between alerts via their heterogeneous categorical attributes. Finally, by incorporating both temporal and content dependencies into one optimization framework, CAR ranks both alerts and their corresponding alert patterns. Our experiments, using real-world enterprise monitoring data and real attacks launched by professional hackers, show that CAR can accurately identify true positive alerts and successfully reconstruct attack scenarios at the same time.
We consider the conjecture by Aichholzer, Aurenhammer, Hurtado, and Krasser that any two points sets with the same cardinality and the same size convex hull can be triangulated in the "same" way, more precisely via \emphcompatible triangulations. We show counterexamples to various strengthened versions of this conjecture.
Dec 09 2016 cs.CV
This paper presents an efficient approach to image segmentation that approximates the piecewise-smooth (PS) functional in  with explicit solutions. By rendering some rational constraints on the initial conditions and the final solutions of the PS functional, we propose two novel formulations which can be approximated to be the explicit solutions of the evolution partial differential equations (PDEs) of the PS model, in which only one PDE needs to be solved efficiently. Furthermore, an energy term that regularizes the level set function to be a signed distance function is incorporated into our evolution formulation, and the time-consuming re-initialization is avoided. Experiments on synthetic and real images show that our method is more efficient than both the PS model and the local binary fitting (LBF) model , while having similar segmentation accuracy as the LBF model.
Dec 06 2016 cs.LG
Deep neural networks (DNNs) have proven to be quite effective in a vast array of machine learning tasks, with recent examples in cyber security and autonomous vehicles. Despite the superior performance of DNNs in these applications, it has been recently shown that these models are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points in order to "fool" the original DNN model, forcing it to mis-classify previously correctly classified samples with high confidence. Addressing this flaw in the model is essential if DNNs are to be used in critical applications such as those in cyber security. Previous work has provided various learning algorithms to enhance the robustness of DNN models, and they all fall into the tactic of "security through obscurity". This means security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate this issue shared across previous research work and propose a generic approach to escalate a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available. Our results indicate that our approach typically provides superior classification performance and resistance in comparison with state-of-art solutions.
A framework is presented for unsupervised learning of representations based on infomax principle for large-scale neural populations. We use an asymptotic approximation to the Shannon's mutual information for a large neural population to demonstrate that a good initial approximation to the global information-theoretic optimum can be obtained by a hierarchical infomax method. Starting from the initial solution, an efficient algorithm based on gradient descent of the final objective function is proposed to learn representations from the input datasets, and the method works for complete, overcomplete, and undercomplete bases. As confirmed by numerical experiments, our method is robust and highly efficient for extracting salient features from input datasets. Compared with the main existing methods, our algorithm has a distinct advantage in both the training speed and the robustness of unsupervised representation learning. Furthermore, the proposed method is easily extended to the supervised or unsupervised model for training deep structure networks.
While Shannon's mutual information has widespread applications in many disciplines, for practical applications it is often difficult to calculate its value accurately for high-dimensional variables because of the curse of dimensionality. This paper is focused on effective approximation methods for evaluating mutual information in the context of neural population coding. For large but finite neural populations, we derive several information-theoretic asymptotic bounds and approximation formulas that remain valid in high-dimensional spaces. We prove that optimizing the population density distribution based on these approximation formulas is a convex optimization problem which allows efficient numerical solutions. Numerical simulation results confirmed that our asymptotic formulas were highly accurate for approximating mutual information for large neural populations. In special cases, the approximation formulas are exactly equal to the true mutual information. We also discuss techniques of variable transformation and dimensionality reduction to facilitate computation of the approximations.
Nov 01 2016 cs.CV
In this paper, we present a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking. We describe a target object with histogram of oriented gradients and raw color features, of which each one is characterized by a set of Boolean maps generated by uniformly thresholding their values. The Boolean maps effectively encode multi-scale connectivity cues of the target with different granularities. The fine-grained Boolean maps capture spatially structural details that are effective for precise target localization while the coarse-grained ones encode global shape information that are robust to large target appearance variations. Finally, all the Boolean maps form together a robust representation that can be approximated by an explicit feature map of the intersection kernel, which is fed into a logistic regression classifier with online update, and the target location is estimated within a particle filter framework. The proposed representation scheme is computationally efficient and facilitates achieving favorable performance in terms of accuracy and robustness against the state-of-the-art tracking methods on a large benchmark dataset of 50 image sequences.
We study the problem of nonparametric dependence detection. Many existing methods suffer severe power loss due to non-uniform consistency, which we illustrate with a paradox. To avoid such power loss, we approach the nonparametric test of independence through the new framework of binary expansion statistics (BEStat) and binary expansion testing (BET), which examine dependence through a novel binary expansion filtration approximation of the copula. Through a Hadamard-Walsh transform, we find that the cross interactions of binary variables in the filtration are complete sufficient statistics for dependence. These interactions are also uncorrelated under the null. By utilizing these interactions, the BET avoids the problem of non-uniform consistency and improves upon a wide class of commonly used methods (a) by achieving the minimax rate in sample size requirement for specified power and (b) by providing clear interpretations of global and local relationships upon rejection of independence. The binary expansion approach also connects the test statistics with the current computing system to facilitate efficient bitwise implementation. We illustrate the BET by a study of the distribution of stars in the night sky and by an exploratory data analysis of the TCGA breast cancer data.
Oct 06 2016 cs.LG
Beyond its highly publicized victories in Go, there have been numerous successful applications of deep learning in information retrieval, computer vision and speech recognition. In cybersecurity, an increasing number of companies have become excited about the potential of deep learning, and have started to use it for various security incidents, the most popular being malware detection. These companies assert that deep learning (DL) could help turn the tide in the battle against malware infections. However, deep neural networks (DNNs) are vulnerable to adversarial samples, a flaw that plagues most if not all statistical learning models. Recent research has demonstrated that those with malicious intent can easily circumvent deep learning-powered malware detection by exploiting this flaw. In order to address this problem, previous work has developed various defense mechanisms that either augmenting training data or enhance model's complexity. However, after a thorough analysis of the fundamental flaw in DNNs, we discover that the effectiveness of current defenses is limited and, more importantly, cannot provide theoretical guarantees as to their robustness against adversarial sampled-based attacks. As such, we propose a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within samples. In this work, we evaluate our proposed technique against a real world dataset with 14,679 malware variants and 17,399 benign programs. We theoretically validate the robustness of our technique, and empirically show that our technique significantly boosts DNN robustness to adversarial samples while maintaining high accuracy in classification. To demonstrate the general applicability of our proposed method, we also conduct experiments using the MNIST and CIFAR-10 datasets, generally used in image recognition research.
Sep 13 2016 cs.CV
Recently deep neural networks based on tanh activation function have shown their impressive power in image denoising. However, much training time is needed because of their very large size. In this letter, we propose a dual-pathway rectifier neural network by combining two rectifier neurons with reversed input and output weights in the same hidden layer. We drive the equivalent activation function and illustrate that it improves the efficiency of capturing information from the noisy data. The experimental results show that our model outperforms other activation functions and achieves state-of-the-art denoising performance, while the network size and the training time are significantly reduced.
Anomaly detection plays an important role in modern data-driven security applications, such as detecting suspicious access to a socket from a process. In many cases, such events can be described as a collection of categorical values that are considered as entities of different types, which we call heterogeneous categorical events. Due to the lack of intrinsic distance measures among entities, and the exponentially large event space, most existing work relies heavily on heuristics to calculate abnormal scores for events. Different from previous work, we propose a principled and unified probabilistic model APE (Anomaly detection via Probabilistic pairwise interaction and Entity embedding) that directly models the likelihood of events. In this model, we embed entities into a common latent space using their observed co-occurrence in different events. More specifically, we first model the compatibility of each pair of entities according to their embeddings. Then we utilize the weighted pairwise interactions of different entity types to define the event probability. Using Noise-Contrastive Estimation with "context-dependent" noise distribution, our model can be learned efficiently regardless of the large event space. Experimental results on real enterprise surveillance data show that our methods can accurately detect abnormal events compared to other state-of-the-art abnormal detection techniques.
There are multiple sides to every story, and while statistical topic models have been highly successful at topically summarizing the stories in corpora of text documents, they do not explicitly address the issue of learning the different sides, the viewpoints, expressed in the documents. In this paper, we show how these viewpoints can be learned completely unsupervised and represented in a human interpretable form. We use a novel approach of applying CorrLDA2 for this purpose, which learns topic-viewpoint relations that can be used to form groups of topics, where each group represents a viewpoint. A corpus of documents about the Israeli-Palestinian conflict is then used to demonstrate how a Palestinian and an Israeli viewpoint can be learned. By leveraging the magnitudes and signs of the feature weights of a linear SVM, we introduce a principled method to evaluate associations between topics and viewpoints. With this, we demonstrate, both quantitatively and qualitatively, that the learned topic groups are contextually coherent, and form consistently correct topic-viewpoint associations.
Aug 16 2016 cs.CV
Discriminative model learning for image denoising has been recently attracting considerable attentions due to its favorable denoising performance. In this paper, we take one step forward by investigating the construction of feed-forward denoising convolutional neural networks (DnCNNs) to embrace the progress in very deep architecture, learning algorithm, and regularization method into image denoising. Specifically, residual learning and batch normalization are utilized to speed up the training process as well as boost the denoising performance. Different from the existing discriminative denoising models which usually train a specific model for additive white Gaussian noise (AWGN) at a certain noise level, our DnCNN model is able to handle Gaussian denoising with unknown noise level (i.e., blind Gaussian denoising). With the residual learning strategy, DnCNN implicitly removes the latent clean image in the hidden layers. This property motivates us to train a single DnCNN model to tackle with several general image denoising tasks such as Gaussian denoising, single image super-resolution and JPEG image deblocking. Our extensive experiments demonstrate that our DnCNN model can not only exhibit high effectiveness in several general image denoising tasks, but also be efficiently implemented by benefiting from GPU computing.
Aug 10 2016 cs.CR
Intrusion detection system (IDS) is an important part of enterprise security system architecture. In particular, anomaly-based IDS has been widely applied to detect abnormal process behaviors that deviate from the majority. However, such abnormal behavior usually consists of a series of low-level heterogeneous events. The gap between the low-level events and the high-level abnormal behaviors makes it hard to infer which single events are related to the real abnormal activities, especially considering that there are massive "noisy" low-level events happening in between. Hence, the existing work that focus on detecting single entities/events can hardly achieve high detection accuracy. Different from previous work, we design and implement GID, an efficient graph-based intrusion detection technique that can identify abnormal event sequences from a massive heterogeneous process traces with high accuracy. GID first builds a compact graph structure to capture the interactions between different system entities. The suspiciousness or anomaly score of process paths is then measured by leveraging random walk technique to the constructed acyclic directed graph. To eliminate the score bias from the path length, the Box-Cox power transformation based approach is introduced to normalize the anomaly scores so that the scores of paths of different lengths have the same distribution. The efficiency of suspicious path discovery is further improved by the proposed optimization scheme. We fully implement our GID algorithm and deploy it into a real enterprise security system, and it greatly helps detect the advanced threats, and optimize the incident response. Executing GID on system monitoring datasets showing that GID is efficient (about 2 million records per minute) and accurate (higher than 80% in terms of detection rate).
Aug 10 2016 cs.CV
A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping. In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set.
Jul 26 2016 cs.LG
Matrix sketching is aimed at finding close approximations of a matrix by factors of much smaller dimensions, which has important applications in optimization and machine learning. Given a matrix A of size m by n, state-of-the-art randomized algorithms take O(m * n) time and space to obtain its low-rank decomposition. Although quite useful, the need to store or manipulate the entire matrix makes it a computational bottleneck for truly large and dense inputs. Can we sketch an m-by-n matrix in O(m + n) cost by accessing only a small fraction of its rows and columns, without knowing anything about the remaining data? In this paper, we propose the cascaded bilateral sampling (CABS) framework to solve this problem. We start from demonstrating how the approximation quality of bilateral matrix sketching depends on the encoding powers of sampling. In particular, the sampled rows and columns should correspond to the code-vectors in the ground truth decompositions. Motivated by this analysis, we propose to first generate a pilot-sketch using simple random sampling, and then pursue more advanced, "follow-up" sampling on the pilot-sketch factors seeking maximal encoding powers. In this cascading process, the rise of approximation quality is shown to be lower-bounded by the improvement of encoding powers in the follow-up sampling step, thus theoretically guarantees the algorithmic boosting property. Computationally, our framework only takes linear time and space, and at the same time its performance rivals the quality of state-of-the-art algorithms consuming a quadratic amount of resources. Empirical evaluations on benchmark data fully demonstrate the potential of our methods in large scale matrix sketching and related areas.
We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the problem as a structured prediction problem on sequential data, our main idea is to use Long Short-Term Memory (LSTM), a special type of recurrent neural networks to model the variable-range dependencies entailed in the task of video summarization. Our learning models attain the state-of-the-art results on two benchmark video datasets. Detailed analysis justifies the design of the models. In particular, we show that it is crucial to take into consideration the sequential structures in videos and model them. Besides advances in modeling techniques, we introduce techniques to address the need of a large number of annotated data for training complex learning models. There, our main idea is to exploit the existence of auxiliary annotated video datasets, albeit heterogeneous in visual styles and contents. Specifically, we show domain adaptation techniques can improve summarization by reducing the discrepancies in statistical properties across those datasets.
May 24 2016 cs.CR
In this paper, we present that security threats coming with existing GPU memory management strategy are overlooked, which opens a back door for adversaries to freely break the memory isolation: they enable adversaries without any privilege in a computer to recover the raw memory data left by previous processes directly. More importantly, such attacks can work on not only normal multi-user operating systems, but also cloud computing platforms. To demonstrate the seriousness of such attacks, we recovered original data directly from GPU memory residues left by exited commodity applications, including Google Chrome, Adobe Reader, GIMP, Matlab. The results show that, because of the vulnerable memory management strategy, commodity applications in our experiments are all affected.
Tensor factorization is a powerful tool to analyse multi-way data. Compared with traditional multi-linear methods, nonlinear tensor factorization models are capable of capturing more complex relationships in the data. However, they are computationally expensive and may suffer severe learning bias in case of extreme data sparsity. To overcome these limitations, in this paper we propose a distributed, flexible nonlinear tensor factorization model. Our model can effectively avoid the expensive computations and structural restrictions of the Kronecker-product in existing TGP formulations, allowing an arbitrary subset of tensorial entries to be selected to contribute to the training. At the same time, we derive a tractable and tight variational evidence lower bound (ELBO) that enables highly decoupled, parallel computations and high-quality inference. Based on the new bound, we develop a distributed inference algorithm in the MapReduce framework, which is key-value-free and can fully exploit the memory cache mechanism in fast MapReduce systems such as SPARK. Experimental results fully demonstrate the advantages of our method over several state-of-the-art approaches, in terms of both predictive performance and computational efficiency. Moreover, our approach shows a promising potential in the application of Click-Through-Rate (CTR) prediction for online advertising.
Apr 12 2016 cs.CV
Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations and occlusions. Recent studies show that deep learning approaches can achieve impressive performance on these two tasks. In this paper, we propose a deep cascaded multi-task framework which exploits the inherent correlation between them to boost up their performance. In particular, our framework adopts a cascaded structure with three stages of carefully designed deep convolutional networks that predict face and landmark location in a coarse-to-fine manner. In addition, in the learning process, we propose a new online hard sample mining strategy that can improve the performance automatically without manual sample selection. Our method achieves superior accuracy over the state-of-the-art techniques on the challenging FDDB and WIDER FACE benchmark for face detection, and AFLW benchmark for face alignment, while keeps real time performance.
Learning the influence structure of multiple time series data is of great interest to many disciplines. This paper studies the problem of recovering the causal structure in network of multivariate linear Hawkes processes. In such processes, the occurrence of an event in one process affects the probability of occurrence of new events in some other processes. Thus, a natural notion of causality exists between such processes captured by the support of the excitation matrix. We show that the resulting causal influence network is equivalent to the Directed Information graph (DIG) of the processes, which encodes the causal factorization of the joint distribution of the processes. Furthermore, we present an algorithm for learning the support of excitation matrix (or equivalently the DIG). The performance of the algorithm is evaluated on synthesized multivariate Hawkes networks as well as a stock market and MemeTracker real-world dataset.
Mar 11 2016 cs.CV
Video summarization has unprecedented importance to help us digest, browse, and search today's ever-growing video collections. We propose a novel subset selection technique that leverages supervision in the form of human-created summaries to perform automatic keyframe-based video summarization. The main idea is to nonparametrically transfer summary structures from annotated videos to unseen test videos. We show how to extend our method to exploit semantic side information about the video's category/genre to guide the transfer process by those training videos semantically consistent with the test input. We also show how to generalize our method to subshot-based summarization, which not only reduces computational costs but also provides more flexible ways of defining visual similarity across subshots spanning several frames. We conduct extensive evaluation on several benchmarks and demonstrate promising results, outperforming existing methods in several settings.
We study the spherical cap packing problem with a probabilistic approach. Such probabilistic considerations result in an asymptotic sharp universal uniform bound on the maximal inner product between any set of unit vectors and a stochastically independent uniformly distributed unit vector. When the set of unit vectors are themselves independently uniformly distributed, we further develop the extreme value distribution limit of the maximal inner product, which characterizes its uncertainty around the bound. As applications of the above asymptotic results, we derive (1) an asymptotic sharp universal uniform bound on the maximal spurious correlation, as well as its uniform convergence in distribution when the explanatory variables are independently Gaussian distributed; and (2) an asymptotic sharp universal bound on the maximum norm of a low-rank elliptically distributed vector, as well as related limiting distributions. With these results, we develop a fast detection method for a low-rank structure in high-dimensional Gaussian data without using the spectrum information.
It is commonplace to encounter nonstationary data, of which the underlying generating process may change over time or across domains. The nonstationarity presents both challenges and opportunities for causal discovery. In this paper we propose a principled framework to handle nonstationarity, and develop some methods to address three important questions. First, we propose an enhanced constraint-based method to detect variables whose local mechanisms are nonstationary and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine some causal directions by taking advantage of information carried by changing distributions. Third, we develop a method for visualizing the nonstationarity of causal modules. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.
A fog computing based radio access network (F-RAN) is presented in this article as a promising paradigm for the fifth generation (5G) wireless communication system to provide high spectral and energy efficiency. The core idea is to take full advantages of local radio signal processing, cooperative radio resource management, and distributed storing capabilities in edge devices, which can decrease the heavy burden on fronthaul and avoid large-scale radio signal processing in the centralized baseband unit pool. This article comprehensively presents the system architecture and key techniques of F-RANs. In particular, key techniques and their corresponding solutions, including transmission mode selection and interference suppression, are discussed. Open issues in terms of edge caching, software-defined networking, and network function virtualization, are also identified.
Recent developments in structural equation modeling have produced several methods that can usually distinguish cause from effect in the two-variable case. For that purpose, however, one has to impose substantial structural constraints or smoothness assumptions on the functional causal models. In this paper, we consider the problem of determining the causal direction from a related but different point of view, and propose a new framework for causal direction determination. We show that it is possible to perform causal inference based on the condition that the cause is "exogenous" for the parameters involved in the generating process from the cause to the effect. In this way, we avoid the structural constraints required by the SEM-based approaches. In particular, we exploit nonparametric methods to estimate marginal and conditional distributions, and propose a bootstrap-based approach to test for the exogeneity condition; the testing results indicate the causal direction between two variables. The proposed method is validated on both synthetic and real data.
Apr 22 2015 cs.CV
Recently, the compressive tracking (CT) method has attracted much attention due to its high efficiency, but it cannot well deal with the large scale target appearance variations due to its data-independent random projection matrix that results in less discriminative features. To address this issue, in this paper we propose an adaptive CT approach, which selects the most discriminative features to design an effective appearance model. Our method significantly improves CT in three aspects: Firstly, the most discriminative features are selected via an online vector boosting method. Secondly, the object representation is updated in an effective online manner, which preserves the stable features while filtering out the noisy ones. Finally, a simple and effective trajectory rectification approach is adopted that can make the estimated location more accurate. Extensive experiments on the CVPR2013 tracking benchmark demonstrate the superior performance of our algorithm compared over state-of-the-art tracking algorithms.
Apr 10 2015 cs.SI
The proliferation of mobile handheld devices in combination with the technological advancements in mobile computing has led to a number of innovative services that make use of the location information available on such devices. Traditional yellow pages websites have now moved to mobile platforms, giving the opportunity to local businesses and potential, near-by, customers to connect. These platforms can offer an affordable advertisement channel to local businesses. One of the mechanisms offered by location-based social networks (LBSNs) allows businesses to provide special offers to their customers that connect through the platform. We collect a large time-series dataset from approximately 14 million venues on Foursquare and analyze the performance of such campaigns using randomization techniques and (non-parametric) hypothesis testing with statistical bootstrapping. Our main finding indicates that this type of promotions are not as effective as anecdote success stories might suggest. Finally, we design classifiers by extracting three different types of features that are able to provide an educated decision on whether a special offer campaign for a local business will succeed or not both in short and long term.
Mar 27 2015 cs.NI
Increasing sources of sensor measurements and prior knowledge have become available for indoor localization on smartphones. How to effectively utilize these sources for enhancing localization accuracy is an important yet challenging problem. In this paper, we present an area state-aided localization algorithm that exploits various sources of information. Specifically, we introduce the concept of area state to indicate the area where the user is on an indoor map. The position of the user is then estimated using inertial measurement unit (IMU) measurements with the aid of area states. The area states are in turn updated based on the position estimates. To avoid accumulated errors of IMU measurements, our algorithm uses WiFi received signal strength indicator (RSSI) to indicate the vicinity of the user to the routers. The experiment results show that our system can achieve satisfactory localization accuracy in a typical indoor environment.
Jan 20 2015 cs.CV
Deep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper we present that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to develop a robust representation for visual tracking. In the first frame, we employ the k-means algorithm to extract a set of normalized patches from the target region as fixed filters, which integrate a series of adaptive contextual filters surrounding the target to define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and the useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps form together a global representation, which is built on mid-level features, thereby remaining close to image-level information, and hence the inner geometric layout of the target is also well preserved. A simple soft shrinkage method with an adaptive threshold is employed to de-noise the global representation, resulting in a robust sparse representation. The representation is updated via a simple and effective online strategy, allowing it to robustly adapt to target appearance variations. Our convolution networks have surprisingly lightweight structure, yet perform favorably against several state-of-the-art methods on the CVPR2013 tracking benchmark dataset with 50 challenging videos.
Dec 30 2014 cs.DS
The primary structure of a ribonucleic acid (RNA) molecule can be represented as a sequence of nucleotides (bases) over the alphabet A, C, G, U. The secondary or tertiary structure of an RNA is a set of base pairs which form bonds between A-U and G-C. For secondary structures, these bonds have been traditionally assumed to be one-to-one and non-crossing. This paper considers pattern matching as well as local alignment between two RNA structures. For pattern matching, we present two algorithms, one for obtaining an exact match, the other for approximate match. We then present an algorithm for RNA local structural alignment.
Taking full advantages of both heterogeneous networks (HetNets) and cloud access radio access networks (CRANs), heterogeneous cloud radio access networks (H-CRANs) are presented to enhance both the spectral and energy efficiencies, where remote radio heads (RRHs) are mainly used to provide high data rates for users with high quality of service (QoS) requirements, while the high power node (HPN) is deployed to guarantee the seamless coverage and serve users with low QoS requirements. To mitigate the inter-tier interference and improve EE performances in H-CRANs, characterizing user association with RRH/HPN is considered in this paper, and the traditional soft fractional frequency reuse (S-FFR) is enhanced. Based on the RRH/HPN association constraint and the enhanced S-FFR, an energy-efficient optimization problem with the resource assignment and power allocation for the orthogonal frequency division multiple access (OFDMA) based H-CRANs is formulated as a non-convex objective function. To deal with the non-convexity, an equivalent convex feasibility problem is reformulated, and closedform expressions for the energy-efficient resource allocation solution to jointly allocate the resource block and transmit power are derived by the Lagrange dual decomposition method. Simulation results confirm that the H-CRAN architecture and the corresponding resource allocation solution can enhance the energy efficiency significantly.
Dec 01 2014 cs.NI
Mobile sensing has become a promising paradigm for mobile users to obtain information by task crowdsourcing. However, due to the social preferences of mobile users, the quality of sensing reports may be impacted by the underlying social attributes and selfishness of individuals. Therefore, it is crucial to consider the social impacts and trustworthiness of mobile users when selecting task participants in mobile sensing. In this paper, we propose a Social Aware Crowdsourcing with Reputation Management (SACRM) scheme to select the well-suited participants and allocate the task rewards in mobile sensing. Specifically, we consider the social attributes, task delay and reputation in crowdsourcing and propose a participant selection scheme to choose the well-suited participants for the sensing task under a fixed task budget. A report assessment and rewarding scheme is also introduced to measure the quality of the sensing reports and allocate the task rewards based the assessed report quality. In addition, we develop a reputation management scheme to evaluate the trustworthiness and cost performance ratio of mobile users for participant selection. Theoretical analysis and extensive simulations demonstrate that SACRM can efficiently improve the crowdsourcing utility and effectively stimulate the participants to improve the quality of their sensing reports.
The very original concept of cognitive radio (CR) raised by Mitola targets at all the environment parameters, including those in physical layer, MAC layer, application layer as well as the information extracted from reasoning. Hence the first CR is also referred to as "full cognitive radio". However, due to its difficult implementation, FCC and Simon Haykin separately proposed a much more simplified definition, in which CR mainly detects one single parameter, i.e., spectrum occupancy, and is also called as "spectrum sensing cognitive radio". With the rapid development of wireless communication, the infrastructure of a wireless system becomes much more complicated while the functionality at every node is desired to be as intelligent as possible, say the self-organized capability in the approaching 5G cellular networks. It is then interesting to re-look into Mitola's definition and think whether one could, besides obtaining the "on/off" status of the licensed user only, achieve more parameters in a cognitive way. In this article, we propose a new cognitive architecture targeting at multiple parameters in future cellular networks, which is a one step further towards the "full cognition" compared to the most existing CR research. The new architecture is elaborated in detailed stages, and three representative examples are provided based on the recent research progress to illustrate the feasibility as well as the validity of the proposed architecture.
In this paper, we discuss the method of Bayesian regression and its efficacy for predicting price variation of Bitcoin, a recently popularized virtual, cryptographic currency. Bayesian regression refers to utilizing empirical data as proxy to perform Bayesian inference. We utilize Bayesian regression for the so-called "latent source model". The Bayesian regression for "latent source model" was introduced and discussed by Chen, Nikolov and Shah (2013) and Bresler, Chen and Shah (2014) for the purpose of binary classification. They established theoretical as well as empirical efficacy of the method for the setting of binary classification. In this paper, instead we utilize it for predicting real-valued quantity, the price of Bitcoin. Based on this price prediction method, we devise a simple strategy for trading Bitcoin. The strategy is able to nearly double the investment in less than 60 day period when run against real data trace.
Jul 22 2014 cs.CR
With millions of apps that can be downloaded from official or third-party market, Android has become one of the most popular mobile platforms today. These apps help people in all kinds of ways and thus have access to lots of user's data that in general fall into three categories: sensitive data, data to be shared with other apps, and non-sensitive data not to be shared with others. For the first and second type of data, Android has provided very good storage models: an app's private sensitive data are saved to its private folder that can only be access by the app itself, and the data to be shared are saved to public storage (either the external SD card or the emulated SD card area on internal FLASH memory). But for the last type, i.e., an app's non-sensitive and non-shared data, there is a big problem in Android's current storage model which essentially encourages an app to save its non-sensitive data to shared public storage that can be accessed by other apps. At first glance, it seems no problem to do so, as those data are non-sensitive after all, but it implicitly assumes that app developers could correctly identify all sensitive data and prevent all possible information leakage from private-but-non-sensitive data. In this paper, we will demonstrate that this is an invalid assumption with a thorough survey on information leaks of those apps that had followed Android's recommended storage model for non-sensitive data. Our studies showed that highly sensitive information from billions of users can be easily hacked by exploiting the mentioned problematic storage model. Although our empirical studies are based on a limited set of apps, the identified problems are never isolated or accidental bugs of those apps being investigated. On the contrary, the problem is rooted from the vulnerable storage model recommended by Android. To mitigate the threat, we also propose a defense framework.
Jul 21 2014 cs.CR
Previous research about sensor based attacks on Android platform focused mainly on accessing or controlling over sensitive device components, such as camera, microphone and GPS. These approaches get data from sensors directly and need corresponding sensor invoking permissions. This paper presents a novel approach (GVS-Attack) to launch permission bypassing attacks from a zero permission Android application (VoicEmployer) through the speaker. The idea of GVS-Attack utilizes an Android system built-in voice assistant module -- Google Voice Search. Through Android Intent mechanism, VoicEmployer triggers Google Voice Search to the foreground, and then plays prepared audio files (like "call number 1234 5678") in the background. Google Voice Search can recognize this voice command and execute corresponding operations. With ingenious designs, our GVS-Attack can forge SMS/Email, access privacy information, transmit sensitive data and achieve remote control without any permission. Also we found a vulnerability of status checking in Google Search app, which can be utilized by GVS-Attack to dial arbitrary numbers even when the phone is securely locked with password. A prototype of VoicEmployer has been implemented to demonstrate the feasibility of GVS-Attack in real world. In theory, nearly all Android devices equipped with Google Services Framework can be affected by GVS-Attack. This study may inspire application developers and researchers rethink that zero permission doesn't mean safety and the speaker can be treated as a new attack surface.
Jul 04 2014 cs.CR
The popularity of mobile device has made people's lives more convenient, but threatened people's privacy at the same time. As end users are becoming more and more concerned on the protection of their private information, it is even harder to track a specific user using conventional technologies. For example, cookies might be cleared by users regularly. Apple has stopped apps accessing UDIDs, and Android phones use some special permission to protect IMEI code. To address this challenge, some recent studies have worked on tracing smart phones using the hardware features resulted from the imperfect manufacturing process. These works have demonstrated that different devices can be differentiated to each other. However, it still has a long way to go in order to replace cookie and be deployed in real world scenarios, especially in terms of properties like uniqueness, robustness, etc. In this paper, we presented a novel method to generate stable and unique device ID stealthy for smartphones by exploiting the frequency response of the speaker. With carefully selected audio frequencies and special sound wave patterns, we can reduce the impacts of non-linear effects and noises, and keep our feature extraction process un-noticeable to users. The extracted feature is not only very stable for a given smart phone speaker, but also unique to that phone. The feature contains rich information that is equivalent to around 40 bits of entropy, which is enough to identify billions of different smart phones of the same model. We have built a prototype to evaluate our method, and the results show that the generated device ID can be used as a replacement of cookie.
Jun 26 2014 cs.IR
Online encyclopedia such as Wikipedia has become one of the best sources of knowledge. Much effort has been devoted to expanding and enriching the structured data by automatic information extraction from unstructured text in Wikipedia. Although remarkable progresses have been made, their effectiveness and efficiency is still limited as they try to tackle an extremely difficult natural language understanding problems and heavily relies on supervised learning approaches which require large amount effort to label the training data. In this paper, instead of performing information extraction over unstructured natural language text directly, we focus on a rich set of semi-structured data in Wikipedia articles: linked entities. The idea of this paper is the following: If we can summarize the relationship between the entity and its linked entities, we immediately harvest some of the most important information about the entity. To this end, we propose a novel rank aggregation approach to remove noise, an effective clustering and labeling algorithm to extract knowledge.
In this paper, based on the coupled social networks (CSN), we propose a hybrid algorithm to nonlinearly integrate both social and behavior information of online users. Filtering algorithm based on the coupled social networks, which considers the effects of both social influence and personalized preference. Experimental results on two real datasets, \emphEpinions and \emphFriendfeed, show that hybrid pattern can not only provide more accurate recommendations, but also can enlarge the recommendation coverage while adopting global metric. Further empirical analyses demonstrate that the mutual reinforcement and rich-club phenomenon can also be found in coupled social networks where the identical individuals occupy the core position of the online system. This work may shed some light on the in-depth understanding structure and function of coupled social networks.
Mar 04 2014 cs.CV
Human beings process stereoscopic correspondence across multiple scales. However, this bio-inspiration is ignored by state-of-the-art cost aggregation methods for dense stereo correspondence. In this paper, a generic cross-scale cost aggregation framework is proposed to allow multi-scale interaction in cost aggregation. We firstly reformulate cost aggregation from a unified optimization perspective and show that different cost aggregation methods essentially differ in the choices of similarity kernels. Then, an inter-scale regularizer is introduced into optimization and solving this new optimization problem leads to the proposed framework. Since the regularization term is independent of the similarity kernel, various cost aggregation methods can be integrated into the proposed general framework. We show that the cross-scale framework is important as it effectively and efficiently expands state-of-the-art cost aggregation methods and leads to significant improvements, when evaluated on Middlebury, KITTI and New Tsukuba datasets.
Feb 11 2014 cs.CV
In this paper, we propose a novel binary-based cost computation and aggregation approach for stereo matching problem. The cost volume is constructed through bitwise operations on a series of binary strings. Then this approach is combined with traditional winner-take-all strategy, resulting in a new local stereo matching algorithm called binary stereo matching (BSM). Since core algorithm of BSM is based on binary and integer computations, it has a higher computational efficiency than previous methods. Experimental results on Middlebury benchmark show that BSM has comparable performance with state-of-the-art local stereo methods in terms of both quality and speed. Furthermore, experiments on images with radiometric differences demonstrate that BSM is more robust than previous methods under these changes, which is common under real illumination.
Jan 29 2014 cs.NI
Measurement shows that 85% of TCP flows in the internet are short-lived flows that stay most of their operation in the TCP startup phase. However, many previous studies indicate that the traditional TCP Slow Start algorithm does not perform well, especially in long fat networks. Two obvious problems are known to impact the Slow Start performance, which are the blind initial setting of the Slow Start threshold and the aggressive increase of the probing rate during the startup phase regardless of the buffer sizes along the path. Current efforts focusing on tuning the Slow Start threshold and/or probing rate during the startup phase have not been considered very effective, which has prompted an investigation with a different approach. In this paper, we present a novel TCP startup method, called threshold-less slow start or SSthreshless Start, which does not need the Slow Start threshold to operate. Instead, SSthreshless Start uses the backlog status at bottleneck buffer to adaptively adjust probing rate which allows better seizing of the available bandwidth. Comparing to the traditional and other major modified startup methods, our simulation results show that SSthreshless Start achieves significant performance improvement during the startup phase. Moreover, SSthreshless Start scales well with a wide range of buffer size, propagation delay and network bandwidth. Besides, it shows excellent friendliness when operating simultaneously with the currently popular TCP NewReno connections.
X-ray computed tomography at the nanometer scale (nano-CT) offers a wide range of applications in scientific and industrial areas. Here we describe a reliable, user-friendly and fast software package based on LabVIEW that may allow to perform all procedures after the acquisition of raw projection images in order to obtain the inner structure of the investigated sample. A suitable image alignment process to address misalignment problems among image series due to mechanical manufacturing errors, thermal expansion and other external factors has been considered together with a novel fast parallel beam 3D reconstruction procedure, developed ad hoc to perform the tomographic reconstruction. Remarkably improved reconstruction results obtained at the Beijing Synchrotron Radiation Facility after the image calibration confirmed the fundamental role of this image alignment procedure that minimizes unwanted blurs and additional streaking artifacts always present in reconstructed slices. Moreover, this nano-CT image alignment and its associated 3D reconstruction procedure fully based on LabVIEW routines, significantly reduce the data post-processing cycle, thus making faster and easier the activity of the users during experimental runs.
Polarization mode dispersion (PMD) is a challenge for high-data-rate optical-communication systems. More researches are desirable for impairments that is induced by PMD in high-speed optical orthogonal frequency division multiplexing (OFDM) transmission system. In this paper, an approximately analytical method for evaluating the power penalty due to first-order PMD in optical OFDM with quadrature amplitude modulation (OFDM/QAM) and filter bank based multi-carrier with offset quadrature amplitude modulation (FBMC/OQAM) transmission system is presented. The simulation results show that, compared with the single carrier with quadrature phase shift keying(SC-QPSK), both the OFDM/QAM and the FBMC/OQAM can decrease the power penalty caused by PMD by half. Furthermore, the FBMC/OQAM shows better power penalty immunity than the OFDM/QAM under the influence of first order PMD.
Nov 11 2013 cs.CV
In this paper, we present a simple yet fast and robust algorithm which exploits the spatio-temporal context for visual tracking. Our approach formulates the spatio-temporal relationships between the object of interest and its local context based on a Bayesian framework, which models the statistical correlation between the low-level features (i.e., image intensity and position) from the target and its surrounding regions. The tracking problem is posed by computing a confidence map, and obtaining the best target location by maximizing an object location likelihood function. The Fast Fourier Transform is adopted for fast learning and detection in this work. Implemented in MATLAB without code optimization, the proposed tracker runs at 350 frames per second on an i7 machine. Extensive experimental results show that the proposed algorithm performs favorably against state-of-the-art methods in terms of efficiency, accuracy and robustness.