results for au:Wang_J in:cs

- Person re-identification (Re-ID) usually suffers from noisy samples with background clutter and mutual occlusion, which makes it extremely difficult to distinguish different individuals across the disjoint camera views. In this paper, we propose a novel deep self-paced learning (DSPL) algorithm to alleviate this problem, in which we apply a self-paced constraint and symmetric regularization to help the relative distance metric training the deep neural network, so as to learn the stable and discriminative features for person Re-ID. Firstly, we propose a soft polynomial regularizer term which can derive the adaptive weights to samples based on both the training loss and model age. As a result, the high-confidence fidelity samples will be emphasized and the low-confidence noisy samples will be suppressed at early stage of the whole training process. Such a learning regime is naturally implemented under a self-paced learning (SPL) framework, in which samples weights are adaptively updated based on both model age and sample loss using an alternative optimization method. Secondly, we introduce a symmetric regularizer term to revise the asymmetric gradient back-propagation derived by the relative distance metric, so as to simultaneously minimize the intra-class distance and maximize the inter-class distance in each triplet unit. Finally, we build a part-based deep neural network, in which the features of different body parts are first discriminately learned in the lower convolutional layers and then fused in the higher fully connected layers. Experiments on several benchmark datasets have demonstrated the superior performance of our method as compared with the state-of-the-art approaches.
- Oct 17 2017 cs.CV arXiv:1710.05193v1This paper cast the multi-view registration into a clustering problem, which can be solved by the proposed approach based on the K-means clustering. For the clustering, all the centroids are uniformly sampled from the initially aligned point sets involved in the multi-view registration. Then, two standard K-means steps are utilized to assign all points to one special cluster and update each cluster centroid. Subsequently, the shape comprised by all cluster centroids can be used to sequentially estimate the rigid transformation for each point set. To obtain accurate results, the K-means steps and transformation estimation should be alternately and iteratively applied to all point sets. Finally, the proposed approach was tested on some public data sets and compared with the-state-of-art algorithms. Experimental results illustrate its good efficiency for the registration of multi-view point sets.
- Consider a generalized multiterminal source coding system, where $\ell\choose m$ encoders, each observing a distinct size-$m$ subset of $\ell$ ($\ell\geq 2$) zero-mean unit-variance symmetrically correlated Gaussian sources with correlation coefficient $\rho$, compress their observations in such a way that a joint decoder can reconstruct the sources within a prescribed mean squared error distortion based on the compressed data. The optimal rate-distortion performance of this system was previously known only for the two extreme cases $m=\ell$ (the centralized case) and $m=1$ (the distributed case), and except when $\rho=0$, the centralized system can achieve strictly lower compression rates than the distributed system under all non-trivial distortion constraints. Somewhat surprisingly, it is established in the present paper that the optimal rate-distortion performance of the afore-described generalized multiterminal source coding system with $m\geq 2$ coincides with that of the centralized system for all distortions when $\rho\leq 0$ and for distortions below an explicit positive threshold (depending on $m$) when $\rho>0$. Moreover, when $\rho>0$, the minimum achievable rate of generalized multiterminal source coding subject to an arbitrary positive distortion constraint $d$ is shown to be within a finite gap (depending on $m$ and $d$) from its centralized counterpart in the large $\ell$ limit except for possibly the critical distortion $d=1-\rho$.
- Oct 06 2017 cs.CV arXiv:1710.02139v1Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up. Existing multi-target tracking methods often use low-level features which are not sufficiently discriminative for identifying faces with such large appearance variations. In this paper, we tackle this problem by learning discriminative, video-specific face representations using convolutional neural networks (CNNs). Unlike existing CNN-based approaches which are only trained on large-scale face image datasets offline, we use the contextual constraints to generate a large number of training samples for a given video, and further adapt the pre-trained face CNN to specific videos using discovered training samples. Using these training samples, we optimize the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity via minimizing a triplet loss function. With the learned discriminative features, we apply the hierarchical clustering algorithm to link tracklets across multiple shots to generate trajectories. We extensively evaluate the proposed algorithm on two sets of TV sitcoms and YouTube music videos, analyze the contribution of each component, and demonstrate significant performance improvement over existing techniques.
- Radiomics aims to extract and analyze large numbers of quantitative features from medical images and is highly promising in staging, diagnosing, and predicting outcomes of cancer treatments. Nevertheless, several challenges need to be addressed to construct an optimal radiomics predictive model. First, the predictive performance of the model may be reduced when features extracted from an individual imaging modality are blindly combined into a single predictive model. Second, because many different types of classifiers are available to construct a predictive model, selecting an optimal classifier for a particular application is still challenging. In this work, we developed multi-modality and multi-classifier radiomics predictive models that address the aforementioned issues in currently available models. Specifically, a new reliable classifier fusion strategy was proposed to optimally combine output from different modalities and classifiers. In this strategy, modality-specific classifiers were first trained, and an analytic evidential reasoning (ER) rule was developed to fuse the output score from each modality to construct an optimal predictive model. One public data set and two clinical case studies were performed to validate model performance. The experimental results indicated that the proposed ER rule based radiomics models outperformed the traditional models that rely on a single classifier or simply use combined features from different modalities.
- Our task is to generate an effective summary for a given document with specific realtime requirements. We use the softplus function to enhance keyword rankings to favor important sentences, based on which we present a number of summarization algorithms using various keyword extraction and topic clustering methods. We show that our algorithms meet the realtime requirements and yield the best ROUGE recall scores on DUC-02 over all previously-known algorithms. We show that our algorithms meet the realtime requirements and yield the best ROUGE recall scores on DUC-02 over all previously-known algorithms. To evaluate the quality of summaries without human-generated benchmarks, we define a measure called WESM based on word-embedding using Word Mover's Distance. We show that the orderings of the ROUGE and WESM scores of our algorithms are highly comparable, suggesting that WESM may serve as a viable alternative for measuring the quality of a summary.
- Oct 03 2017 cs.DB arXiv:1710.00560v1Time series data have exploded due to the popularity of new applications, like data center management and IoT. Time series data management system (TSDB), emerges to store and query the large volume of time series data. Subsequence matching is critical in many time series mining algorithms, and extensive approaches have been proposed. However, the shift of distributed storage system and the performance gap make these approaches not compatible with TSDB. To fill this gap, we propose a new index structure, KV-index, and the corresponding matching algorithm, KV-match. KV-index is a file-based structure, which can be easily implemented on local files, HDFS or HBase tables. KV-match algorithm probes the index efficiently with a few sequential scans. Moreover, two optimization techniques, window reduction and window reordering, are proposed to further accelerate the processing. To support the query of arbitrary lengths, we extend KV-match to KV-match$_{DP}$, which utilizes multiple varied length indexes to process the query simultaneously. A two-dimensional dynamic programming algorithm is proposed to find the optimal query segmentation. We implement our approach on both local files and HBase tables, and conduct extensive experiments on synthetic and real-world datasets. Results show that our index is of comparable size to the popular tree-style index while our query processing is order of magnitudes more efficient.
- We study automatic title generation for a given block of text and present a method called DTATG to generate titles. DTATG first extracts a small number of central sentences that convey the main meanings of the text and are in a suitable structure for conversion into a title. DTATG then constructs a dependency tree for each of these sentences and removes certain branches using a Dependency Tree Compression Model we devise. We also devise a title test to determine if a sentence can be used as a title. If a trimmed sentence passes the title test, then it becomes a title candidate. DTATG selects the title candidate with the highest ranking score as the final title. Our experiments showed that DTATG can generate adequate titles. We also showed that DTATG-generated titles have higher F1 scores than those generated by the previous methods.
- We propose a deep learning-based approach to the problem of premise selection: selecting mathematical statements relevant for proving a given conjecture. We represent a higher-order logic formula as a graph that is invariant to variable renaming but still fully preserves syntactic and semantic information. We then embed the graph into a vector via a novel embedding method that preserves the information of edge ordering. Our approach achieves state-of-the-art results on the HolStep dataset, improving the classification accuracy from 83% to 90.3%.
- Sep 28 2017 cs.CR arXiv:1709.09454v1In social networks, different users may have different privacy preferences and there are many users with public identities. Most work on differentially private social network data publication neglects this fact. We aim to release the number of public users that a private user connects to within n hops, called n-range Connection fingerprints(CFPs), under user-level personalized privacy preferences. We proposed two schemes, Distance-based exponential budget absorption (DEBA) and Distance-based uniformly budget absorption using Ladder function (DUBA-LF), for privacy-preserving publication of the CFPs based on Personalized differential privacy(PDP), and we conducted a theoretical analysis of the privacy guarantees provided within the proposed schemes. The implementation showed that the proposed schemes are superior in publication errors on real datasets.
- Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the generative model as a reinforcement learning policy has shown promising results in text generation. However, the scalar guiding signal is only available after the entire text has been generated and lacks intermediate information about text structure during the generative process. As such, it limits its success when the length of the generated text samples is long (more than 20 words). In this paper, we propose a new framework, called LeakGAN, to address the problem for long text generation. We allow the discriminative net to leak its own high-level extracted features to the generative net to further help the guidance. The generator incorporates such informative signals into all generation steps through an additional Manager module, which takes the extracted features of current generated words and outputs a latent vector to guide the Worker module for next-word generation. Our extensive experiments on synthetic data and various real-world tasks with Turing test demonstrate that LeakGAN is highly effective in long text generation and also improves the performance in short text generation scenarios. More importantly, without any supervision, LeakGAN would be able to implicitly learn sentence structures only through the interaction between Manager and Worker.
- Sep 26 2017 cs.CV arXiv:1709.08553v1Recognising semantic pedestrian attributes in surveillance images is a challenging task for computer vision, particularly when the imaging quality is poor with complex background clutter and uncontrolled viewing conditions, and the number of labelled training data is small. In this work, we formulate a Joint Recurrent Learning (JRL) model for exploring attribute context and correlation in order to improve attribute recognition given small sized training data with poor quality images. The JRL model learns jointly pedestrian attribute correlations in a pedestrian image and in particular their sequential ordering dependencies (latent high-order correlation) in an end-to-end encoder/decoder recurrent network. We demonstrate the performance advantage and robustness of the JRL model over a wide range of state-of-the-art deep models for pedestrian attribute recognition, multi-label image classification, and multi-person image annotation on two largest pedestrian attribute benchmarks PETA and RAP.
- Sep 26 2017 cs.CV arXiv:1709.08142v1This paper proposes a method for video smoke detection using synthetic smoke samples. The virtual data can automatically offer precise and rich annotated samples. However, the learning of smoke representations will be hurt by the appearance gap between real and synthetic smoke samples. The existed researches mainly work on the adaptation to samples extracted from original annotated samples. These methods take the object detection and domain adaptation as two independent parts. To train a strong detector with rich synthetic samples, we construct the adaptation to the detection layer of state-of-the-art single-model detectors (SSD and MS-CNN). The training procedure is an end-to-end stage. The classification, location and adaptation are combined in the learning. The performance of the proposed model surpasses the original baseline in our experiments. Meanwhile, our results show that the detectors based on the adversarial adaptation are superior to the detectors based on the discrepancy adaptation. Code will be made publicly available on http://smoke.ustc.edu.cn.
- Sep 26 2017 cs.CV arXiv:1709.08393v1Recently, the low rank and sparse (LRS) matrix decomposition has been introduced as an effective mean to solve the multi-view registration. However, this method presents two notable disadvantages: the registration result is quite sensitive to the sparsity of the LRS matrix; besides, the decomposition process treats each block element equally in spite of their reliability. Therefore, this paper firstly proposes a matrix completion method based on the overlap percentage of scan pairs. By completing the LRS matrix with reliable block elements as much as possible, more synchronization constraints of relative motions can be utilized for registration. Furthermore, it is observed that the reliability of each element in the LRS matrix can be weighed by the relationship between its corresponding model and data shapes. Therefore, a weight matrix is designed to measure the contribution of each element to decomposition and accordingly, the decomposition result is closer to the ground truth than before. Benefited from the more informative LRS matrix as well as the weight matrix, experimental results conducted on several public datasets demonstrate the superiority of the proposed approach over other methods on both accuracy and robustness.
- Sep 22 2017 cs.CV arXiv:1709.07220v1In this paper, we address the problem of estimating the positions of human joints, i.e., articulated pose estimation. Recent state-of-the-art solutions model two key issues, joint detection and spatial configuration refinement, together using convolutional neural networks. Our work mainly focuses on spatial configuration refinement by reducing variations of human poses statistically, which is motivated by the observation that the scattered distribution of the relative locations of joints e.g., the left wrist is distributed nearly uniformly in a circular area around the left shoulder) makes the learning of convolutional spatial models hard. We present a two-stage normalization scheme, human body normalization and limb normalization, to make the distribution of the relative joint locations compact, resulting in easier learning of convolutional spatial models and more accurate pose estimation. In addition, our empirical results show that incorporating multi-scale supervision and multi-scale fusion into the joint detection network is beneficial. Experiment results demonstrate that our method consistently outperforms state-of-the-art methods on the benchmarks.
- Sep 20 2017 cs.CV arXiv:1709.06247v1With the rapid development of Deep Convolutional Neural Networks (DCNNs), numerous works focus on designing better network architectures (i.e., AlexNet, VGG, Inception, ResNet and DenseNet etc.). Nevertheless, all these networks have the same characteristic: each convolutional layer is followed by an activation layer, a Rectified Linear Unit (ReLU) layer is the most used among them. In this work, we argue that the paired module with 1:1 convolution and ReLU ratio is not the best choice since it may result in poor generalization ability. Thus, we try to investigate the more suitable convolution and ReLU ratio for exploring the better network architectures. Specifically, inspired by Leaky ReLU, we focus on adopting the proportional module with N:M (N$>$M) convolution and ReLU ratio to design the better networks. From the perspective of ensemble learning, Leaky ReLU can be considered as an ensemble of networks with different convolution and ReLU ratio. We find that the proportional module with N:M (N$>$M) convolution and ReLU ratio can help networks acquire the better performance, through the analysis of a simple Leaky ReLU model. By utilizing the proportional module with N:M (N$>$M) convolution and ReLU ratio, many popular networks can form more rich representations in models, since the N:M (N$>$M) proportional module can utilize information more effectively. Furthermore, we apply this module in diverse DCNN models to explore whether is the N:M (N$>$M) convolution and ReLU ratio indeed more effective. From our experimental results, we can find that such a simple yet effective method achieves better performance in different benchmarks with various network architectures and the experimental results verify that the superiority of the proportional module.
- Sep 20 2017 cs.CV arXiv:1709.06391v1For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task. Unfortunately, real-world tasks are often very long, complex, and repetitive; as a result forecasting is not trivial. In this paper, we propose a novel deep recurrent architecture that takes as input features from a two-stream Residual action recognition framework, and learns to estimate the progress of human activities from video sequences -- this surrogate progress estimation task implicitly learns a temporal task grammar with respect to which activities can be localized and forecasted. To learn the task grammar, we propose a stacked LSTM based multi-granularity progress estimation framework that uses a novel cumulative Euclidean loss as objective. To demonstrate the effectiveness of our proposed architecture, we showcase experiments on two challenging robotic assistive tasks, namely (i) assembling an Ikea table from its constituents, and (ii) changing the tires of a car. Our results demonstrate that learning task grammars offers highly discriminative cues improving the forecasting accuracy by more than 9% over the baseline two-stream forecasting model, while also outperforming other competitive schemes.
- In this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning, and scale the population size up to millions. Our results show that the population dynamics of AI agents, driven only by each agent's individual self interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.
- Molecular communication (MC) is a kind of communication technology based on biochemical molecules for internet of bio-nano things, in which the biochemical molecule is used as the information carrier for the interconnection of nano-devices. In this paper, the basic principle of diffusion based MC and the corresponding key technologies are comprehensively surveyed. In particular, the state-of-the-art achievements relative to the diffusion based MC are discussed and compared, including the system model, the system performance analysis with key influencing factors, the information coding and modulation techniques. Meanwhile, the multi-hop nano-network based on the diffusion MC is presented as well. Additionally, given the extensiveness of the research area, open issues and challenges are presented to spur future investigations, in which the involvement of channel model, information theory, self-organizing nano-network, and biochemical applications are put forward.
- Sep 15 2017 cs.CV arXiv:1709.04577v1In this paper, we study the task of detecting semantic parts of an object. This is very important in computer vision, as it provides the possibility to parse an object as human do, and helps us better understand object detection algorithms. Also, detecting semantic parts is very challenging especially when the parts are partially or fully occluded. In this scenario, the popular proposal-based methods like Faster-RCNN often produce unsatisfactory results, because both the proposal extraction and classification stages may be confused by the irrelevant occluders. To this end, we propose a novel detection framework, named DeepVoting, which accumulates local visual cues, called visual concepts (VC), to locate the semantic parts. Our approach involves adding two layers after the intermediate outputs of a deep neural network. The first layer is used to extract VC responses, and the second layer performs a voting mechanism to capture the spatial relationship between VC's and semantic parts. The benefit is that each semantic part is supported by multiple VC's. Even if some of the supporting VC's are missing due to occlusion, we can still infer the presence of the target semantic part using the remaining ones. To avoid generating an exponentially large training set to cover all occlusion cases, we train our model without seeing occlusion and transfer the learned knowledge to deal with occlusions. This setting favors learning the models which are naturally robust and adaptive to occlusions instead of over-fitting the occlusion patterns in the training data. In experiments, DeepVoting shows significantly better performance on semantic part detection in occlusion scenarios, compared with Faster-RCNN, with one order of magnitude fewer parameters and 2.5x testing speed. In addition, DeepVoting is explainable as the detection result can be diagnosed via looking up the voted VC's.
- Sep 14 2017 cs.CV arXiv:1709.04303v1Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN, we adopt the stacked convolutional layers to effectively capture the contextual dependencies of the input sequence, which is characterized by lower computational complexity and easier parallel computation. Compared to the chain structure of recurrent networks, the Convolutional Neural Network (CNN) provides a natural way to capture long-term dependencies between elements, which is 9 times faster than Bidirectional Long Short-Term Memory (BLSTM). Furthermore, in order to enhance the representation of foreground text and suppress the background noise, we incorporate the residual attention modules into a small densely connected network to improve the discriminability of CNN features. We validate the performance of our approach on the standard benchmarks, including the Street View Text, IIIT5K and ICDAR datasets. As a result, state-of-the-art or highly-competitive performance and efficiency show the superiority of the proposed approach.
- Sep 13 2017 cs.CV arXiv:1709.03656v1Due to the existence of various views or representations in many real-world data, multi-view learning has drawn much attention recently. Multi-view spectral clustering methods based on similarity matrixes or graphs are pretty popular. Generally, these algorithms learn informative graphs by directly utilizing original data. However, in the real-world applications, original data often contain noises and outliers that lead to unreliable graphs. In addition, different views may have different contributions to data clustering. In this paper, a novel Multiview Subspace Clustering method unifying Adaptive neighbours and Metric learning (MSCAM), is proposed to address the above problems. In this method, we use the subspace representations of different views to adaptively learn a consensus similarity matrix, uncovering the subspace structure and avoiding noisy nature of original data. For all views, we also learn different Mahalanobis matrixes that parameterize the squared distances and consider the contributions of different views. Further, we constrain the graph constructed by the similarity matrix to have exact c (c is the number of clusters) connected components. An iterative algorithm is developed to solve this optimization problem. Moreover, experiments on a synthetic dataset and different real-world datasets demonstrate the effectiveness of MSCAM.
- In OFDM relay networks with IQ imbalances and full-duplex relay station (RS), how to optimize pilot pattern and power allocation using the criterion of minimizing the sum of mean square errors (Sum-MSE) for the frequency-domain least-squares channel estimator has a heavy impact on self-interference cancellation. Firstly, the design problem of pilot pattern is casted as a convex optimization. From the KKT conditions, the optimal analytical expression is derived given the fixed source power and RS power. Subsequently, an optimal power allocation (OPA) strategy is proposed and presented to further alleviate the effect of Sum-MSE under the total transmit power sum constraint of source node and RS. Simulation results show that the proposed OPA performs better than equal power allocation (EPA) in terms of Sum-MSE, and the Sum-MSE performance gain grows with deviating $\rho$ from the value of $\rho^o$ minimizing the Sum-MSE, where $\rho$ is defined as the average ratio of the residual SI channel at RS to the intended channel from source to RS. For example, the OPA achieves about 5dB SNR gain over EPA by shrinking or stretching $\rho$ with a factor $4$. More importantly, as $\rho$ decreases or increases more, the performance gain becomes more significant.
- The application of hybrid precoding in millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems has been proved effective for reducing the number of radio frequency (RF) chains. However, the maximum number of independent data streams is conventionally restricted by the number of RF chains, which leads to limiting the spatial multiplexing gain. To further improve the achievable spectral efficiency (SE), in this paper we propose a novel generalized spatial modulation (GenSM) aided mmWave MIMO system to convey an extra data stream via the index of the active antennas group, while no extra RF chain is required. Moreover, we also propose a hybrid analog and digital precoding scheme for SE maximization. More specifically, a closed-form lower bound is firstly derived to quantify the achievable SE of the proposed system. By utilizing this lower bound as the cost function, a two-step algorithm is proposed to optimize the hybrid precoder. The proposed algorithm not only utilizes the concavity of the cost function over the digital power allocation vector, but also invokes the convex $\ell_\infty$ relaxation to handle the non-convex constraint imposed by analog precoding. Finally, the proposed scheme is shown via simulations to outperform state-of-the-art mmWave MIMO schemes in terms of achievable SE.
- Generative modeling, which learns joint probability distribution from training data and generates samples according to it, is an important task in machine learning and artificial intelligence. Inspired by probabilistic interpretation of quantum physics, we propose a generative model using matrix product states, which is a tensor network originally proposed for describing (particularly one-dimensional) entangled quantum states. Our model enjoys efficient learning by utilizing the density matrix renormalization group method which allows dynamic adjusting dimensions of the tensors, and offers an efficient direct sampling approach, Zipper, for generative tasks. We apply our method to generative modeling of several standard datasets including the principled Bars and Stripes, random binary patterns and the MNIST handwritten digits, to illustrate ability of our model, and discuss features as well as drawbacks of our model over popular generative models such as Hopfield model, Boltzmann machines and generative adversarial networks. Our work shed light on many interesting directions for future exploration on the development of quantum-inspired algorithms for unsupervised machine learning, which is of possibility of being realized by a quantum device.
- Aug 31 2017 physics.soc-ph cs.SY arXiv:1708.08983v1With the fast-growing wind power penetration, additional flexibility is of great need for helping power systems cope with wind power ramping. Several electricity markets have established requirements for flexible ramping capacity (FRC) reserves. This paper addresses two crucial issues which have been rarely discussed in the literature: 1) how to model wind power ramping under different forecast values; 2) how to achieve a reasonable trade-off between operational risks and FRC costs. To answer the first question, this paper proposes a concept of conditional distributions of wind power ramping, which is empirically verified by using real-world data. Regarding the second question, this paper develops an adjustable chance-constrained approach to optimally allocate FRC reserves. Equivalent tractable forms of the original problem are devised to improve computational efficiency. Tests carried out on a modified IEEE 118-bus system demonstrate the effectiveness and efficiency of the proposed method.
- Aug 29 2017 cs.PL arXiv:1708.08323v1Bounded model checking is among the most efficient techniques for the automatic verification of concurrent programs. However, encoding all possible interleavings often requires a huge and complex formula, which significantly limits the salability. This paper proposes a novel and efficient abstraction refinement method for multi-threaded program verification. Observing that the huge formula is usually dominated by the exact encoding of the scheduling constraint, this paper proposes a \tsc based abstraction refinement method, which avoids the huge and complex encoding of BMC. In addition, to obtain an effective refinement, we have devised two graph-based algorithms over event order graph for counterexample validation and refinement generation, which can always obtain a small yet effective refinement constraint. Enhanced by two constraint-based algorithms for counterexample validation and refinement generation, we have proved that our method is sound and complete w.r.t. the given loop unwinding depth. Experimental results on \svcompc benchmarks indicate that our method is promising and significantly outperforms the existing state-of-the-art tools.
- Image splicing detection is of fundamental importance in digital forensics and therefore has attracted increasing attention recently. In this paper, a color image splicing detection approach is proposed based on Markov transition probability of quaternion component separation in quaternion discrete cosine transform (QDCT) domain and quaternion wavelet transform (QWT) domain. Firstly, Markov features of the intra-block and inter-block between block QDCT coefficients are obtained from the real part and three imaginary parts of QDCT coefficients respectively. Then, additional Markov features are extracted from luminance (Y) channel in quaternion wavelet transform domain to characterize the dependency of position among quaternion wavelet subband coefficients. Finally, ensemble classifier (EC) is exploited to classify the spliced and authentic color images. The experiment results demonstrate that the proposed approach can outperforms some state-of-the-art methods.
- Aug 28 2017 cs.AI arXiv:1708.07775v1Most natural language processing tasks can be formulated as the approximated nearest neighbor search problem, such as word analogy, document similarity, machine translation. Take the question-answering task as an example, given a question as the query, the goal is to search its nearest neighbor in the training dataset as the answer. However, existing methods for approximate nearest neighbor search problem may not perform well owing to the following practical challenges: 1) there are noise in the data; 2) the large scale dataset yields a huge retrieval space and high search time complexity. In order to solve these problems, we propose a novel approximate nearest neighbor search framework which i) projects the data to a subspace based spectral analysis which eliminates the influence of noise; ii) partitions the training dataset to different groups in order to reduce the search space. Specifically, the retrieval space is reduced from $O(n)$ to $O(\log n)$ (where $n$ is the number of data points in the training dataset). We prove that the retrieved nearest neighbor in the projected subspace is the same as the one in the original feature space. We demonstrate the outstanding performance of our framework on real-world natural language processing tasks.
- Joint User Selection and Energy Minimization for Ultra-Dense Multi-channel C-RAN with Incomplete CSIAug 23 2017 cs.NI arXiv:1708.06700v1This paper provides a unified framework to deal with the challenges arising in dense cloud radio access networks (C-RAN), which include huge power consumption, limited fronthaul capacity, heavy computational complexity, unavailability of full channel state information (CSI), etc. Specifically, we aim to jointly optimize the remote radio head (RRH) selection, user equipment (UE)-RRH associations and beam-vectors to minimize the total network power consumption (NPC) for dense multi-channel downlink C-RAN with incomplete CSI subject to per-RRH power constraints, each UE's total rate requirement, and fronthaul link capacity constraints. This optimization problem is NP-hard. In addition, due to the incomplete CSI, the exact expression of UEs' rate expression is intractable. We first conservatively replace UEs' rate expression with its lower-bound. Then, based on the successive convex approximation (SCA) technique and the relationship between the data rate and the mean square error (MSE), we propose a single-layer iterative algorithm to solve the NPC minimization problem with convergence guarantee. In each iteration of the algorithm, the Lagrange dual decomposition method is used to derive the structure of the optimal beam-vectors, which facilitates the parallel computations at the Baseband unit (BBU) pool. Furthermore, a bisection UE selection algorithm is proposed to guarantee the feasibility of the problem. Simulation results show the benefits of the proposed algorithms and the fact that a limited amount of CSI is sufficient to achieve performance close to that obtained when perfect CSI is possessed.
- Blockchain technologies are gaining massive momentum in the last few years. Blockchains are distributed ledgers that enable parties who do not fully trust each other to maintain a set of global states. The parties agree on the existence, values and histories of the states. As the technology landscape is expanding rapidly, it is both important and challenging to have a firm grasp of what the core technologies have to offer, especially with respect to their data processing capabilities. In this paper, we first survey the state of the art, focusing on private blockchains (in which parties are authenticated). We analyze both in-production and research systems in four dimensions: distributed ledger, cryptography, consensus protocol and smart contract. We then present BLOCKBENCH, a benchmarking framework for understanding performance of private blockchains against data processing workloads. We conduct a comprehensive evaluation of three major blockchain systems based on BLOCKBENCH, namely Ethereum, Parity and Hyperledger Fabric. The results demonstrate several trade-offs in the design space, as well as big performance gaps between blockchain and database systems. Drawing from design principles of database systems, we discuss several research directions for bringing blockchain performance closer to the realm of databases.
- Person re-identification (Re-ID) aims at matching images of the same person across disjoint camera views, which is a challenging problem in multimedia analysis, multimedia editing and content-based media retrieval communities. The major challenge lies in how to preserve similarity of the same person across video footages with large appearance variations, while discriminating different individuals. To address this problem, conventional methods usually consider the pairwise similarity between persons by only measuring the point to point (P2P) distance. In this paper, we propose to use deep learning technique to model a novel set to set (S2S) distance, in which the underline objective focuses on preserving the compactness of intra-class samples for each camera view, while maximizing the margin between the intra-class set and inter-class set. The S2S distance metric is consisted of three terms, namely the class-identity term, the relative distance term and the regularization term. The class-identity term keeps the intra-class samples within each camera view gathering together, the relative distance term maximizes the distance between the intra-class class set and inter-class set across different camera views, and the regularization term smoothness the parameters of deep convolutional neural network (CNN). As a result, the final learned deep model can effectively find out the matched target to the probe object among various candidates in the video gallery by learning discriminative and stable feature representations. Using the CUHK01, CUHK03, PRID2011 and Market1501 benchmark datasets, we extensively conducted comparative evaluations to demonstrate the advantages of our method over the state-of-the-art approaches.
- Data stream mining problem has caused widely concerns in the area of machine learning and data mining. In some recent studies, ensemble classification has been widely used in concept drift detection, however, most of them regard classification accuracy as a criterion for judging whether concept drift happening or not. Information entropy is an important and effective method for measuring uncertainty. Based on the information entropy theory, a new algorithm using information entropy to evaluate a classification result is developed. It uses ensemble classification techniques, and the weight of each classifier is decided through the entropy of the result produced by an ensemble classifiers system. When the concept in data streams changing, the classifiers' weight below a threshold value will be abandoned to adapt to a new concept in one time. In the experimental analysis section, six databases and four proposed algorithms are executed. The results show that the proposed method can not only handle concept drift effectively, but also have a better classification accuracy and time performance than the contrastive algorithms.
- Aug 10 2017 cs.CV arXiv:1708.02863v1The region-based Convolutional Neural Network (CNN) detectors such as Faster R-CNN or R-FCN have already shown promising results for object detection by combining the region proposal subnetwork and the classification subnetwork together. Although R-FCN has achieved higher detection speed while keeping the detection performance, the global structure information is ignored by the position-sensitive score maps. To fully explore the local and global properties, in this paper, we propose a novel fully convolutional network, named as CoupleNet, to couple the global structure with local parts for object detection. Specifically, the object proposals obtained by the Region Proposal Network (RPN) are fed into the the coupling module which consists of two branches. One branch adopts the position-sensitive RoI (PSRoI) pooling to capture the local part information of the object, while the other employs the RoI pooling to encode the global and context information. Next, we design different coupling strategies and normalization ways to make full use of the complementary advantages between the global and local branches. Extensive experiments demonstrate the effectiveness of our approach. We achieve state-of-the-art results on all three challenging datasets, i.e. a mAP of 82.7% on VOC07, 80.4% on VOC12, and 34.4% on COCO. Codes will be made publicly available.
- Aug 08 2017 cs.CV arXiv:1708.01682v1The modern image search system requires semantic understanding of image, and a key yet under-addressed problem is to learn a good metric for measuring the similarity between images. While deep metric learning has yielded impressive performance gains by extracting high level abstractions from image data, a proper objective loss function becomes the central issue to boost the performance. In this paper, we propose a novel angular loss, which takes angle relationship into account, for learning better similarity metric. Whereas previous metric learning methods focus on optimizing the similarity (contrastive loss) or relative similarity (triplet loss) of image pairs, our proposed method aims at constraining the angle at the negative point of triplet triangles. Several favorable properties are observed when compared with conventional methods. First, scale invariance is introduced, improving the robustness of objective against feature variance. Second, a third-order geometric constraint is inherently imposed, capturing additional local structure of triplet triangles than contrastive loss or triplet loss. Third, better convergence has been demonstrated by experiments on three publicly available datasets.
- Aug 08 2017 cs.CG arXiv:1708.02114v2A \emph$k$-track layout of a graph consists of a vertex $k$ colouring, and a total order of each vertex colour class, such that between each pair of colour classes no two edges cross. A \emph$k$-queue layout of a graph consists of a total order of the vertices, and a partition of the edges into $k$ sets such that no two edges that are in the same set are nested with respect to the vertex ordering. The \emphtrack number (\emphqueue number) of a graph $G$, is the minimum $k$ such that $G$ has a $k$-track ($k$-queue) layout. This paper proves that every $n$-vertex plane graph has constant-bound track and queue numbers. The result implies that every plane has a 3D crossing-free straight-line grid drawing in $O(n)$ volume. The proof utilizes a novel graph partition technique.
- In this paper, we study several GAN related topics mathematically, including Inception score, label smoothing, gradient vanishing and the -log(D(x)) alternative. We show that Inception score is actually equivalent to Mode score, both consisting of two entropy terms, which has the drawback of ignoring the prior distribution of the labels. We thus propose AM score as an alternative that leverages cross-entropy and takes the reference distribution into account. Empirical results indicate that AM score outperforms Inception score. We study label smoothing, gradient vanishing and -log(D(x)) alternative from the perspective of class-aware gradient, with which we show the exact problems when applying label smoothing to fake samples along with the log(1-D(x)) generator loss, which is previously unclear, and more importantly show that the problem does not exist when using the -log(D(x)) generator loss.
- Aug 07 2017 cs.CV arXiv:1708.01541v1The traditional methods of image assessment, such as mean squared error (MSE), signal-to-noise ratio (SNR), and Peak signal-to-noise ratio (PSNR), are all based on the absolute error of images. Pearson's inner-product correlation coefficient (PCC) is also usually used to measure the similarity between images. Structural similarity (SSIM) index is another important measurement which has been shown to be more effective in the human vision system (HVS). Although there are many essential differences among these image assessments, some important associations among them as cost functions in linear decomposition are discussed in this paper. Firstly, the selected bases from a basis set for a target vector are the same in the linear decomposition schemes with different cost functions MSE, SSIM, and PCC. Moreover, for a target vector, the ratio of the corresponding affine parameters in the MSE-based linear decomposition scheme and the SSIM-based scheme is a constant, which is just the value of PCC between the target vector and its estimated vector.
- Covert communication aims to hide the very existence of wireless transmissions in order to guarantee a strong security in wireless networks. In this work, we examine the possibility and achievable performance of covert communication in one-way relay networks. Specifically, the relay is greedy and opportunistically transmits its own information to the destination covertly on top of forwarding the source's message, while the source tries to detect this covert transmission to discover the illegitimate usage of the resource (e.g., power, spectrum) allocated only for the purpose of forwarding source's information. We propose two strategies for the relay to transmit its covert information, namely fixed-rate and fixed-power transmission schemes, for which the source's detection limits are analysed in terms of the false alarm and miss detection rates and the achievable effective covert rates from the relay to destination are derived. Our examination determines the conditions under which the fixed-rate transmission scheme outperforms the fixed-power transmission scheme, and vice versa, which enables the relay to achieve the maximum effective covert rate. Our analysis indicates that the relay has to forward the source's message to shield its covert transmission and the effective covert rate increases with its forwarding ability (e.g., its maximum transmit power).
- Aug 01 2017 cs.DC arXiv:1707.09488v1In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis, and storage of such humongous volume of data, it has now become mandatory to exploit the power of massively parallel architecture for fast computation. Cloud computing provides a cheap source of such computing framework for large volume of data for real-time applications. It is, therefore, not surprising to see that cloud computing has become a buzzword in the computing fraternity over the last decade. This book presents some critical applications in cloud frameworks along with some innovation design of algorithms and architecture for deployment in cloud environment. It is a valuable source of knowledge for researchers, engineers, practitioners, and graduate and doctoral students working in the field of cloud computing. It will also be useful for faculty members of graduate schools and universities.
- Jul 27 2017 cs.CV arXiv:1707.08289v1Image matting plays an important role in image and video editing. However, the formulation of image matting is inherently ill-posed. Traditional methods usually employ interaction to deal with the image matting problem with trimaps and strokes, and cannot run on the mobile phone in real-time. In this paper, we propose a real-time automatic deep matting approach for mobile devices. By leveraging the densely connected blocks and the dilated convolution, a light full convolutional network is designed to predict a coarse binary mask for portrait images. And a feathering block, which is edge-preserving and matting adaptive, is further developed to learn the guided filter and transform the binary mask into alpha matte. Finally, an automatic portrait animation system based on fast deep matting is built on mobile devices, which does not need any interaction and can realize real-time matting with 15 fps. The experiments show that the proposed approach achieves comparable results with the state-of-the-art matting solvers.
- Prior asymptotic performance analyses are based on the series expansion of the moment-generating function (MGF) or the probability density function (PDF) of channel coefficients. However, these techniques fail for lognormal fading channels because the Taylor series of the PDF of a lognormal random variable is zero at the origin and the MGF does not have an explicit form. Although lognormal fading model has been widely applied in wireless communications and free-space optical communications, few analytical tools are available to provide elegant performance expressions for correlated lognormal channels. In this work, we propose a novel framework to analyze the asymptotic outage probabilities of selection combining (SC), equal-gain combining (EGC) and maximum-ratio combining (MRC) over equally correlated lognormal fading channels. Based on these closed-form results, we reveal the followings: i) the outage probability of EGC or MRC becomes an infinitely small quantity compared to that of SC at large signal-to-noise ratio (SNR); ii) channel correlation can result in an infinite performance loss at large SNR. More importantly, the analyses reveal insights into the long-standing problem of performance analyses over correlated lognormal channels at high SNR, and circumvent the time-consuming Monte Carlo simulation and numerical integration.
- Jul 26 2017 cs.LG arXiv:1707.07901v1Adversarial learning has been successfully embedded into deep networks to learn transferable features, which reduce distribution discrepancy between the source and target domains. Existing domain adversarial networks assume fully shared label space across domains. In the presence of big data, there is strong motivation of transferring both classification and representation models from existing big domains to unknown small domains. This paper introduces partial transfer learning, which relaxes the shared label space assumption to that the target label space is only a subspace of the source label space. Previous methods typically match the whole source domain to the target domain, which are prone to negative transfer for the partial transfer problem. We present Selective Adversarial Network (SAN), which simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space. Experiments demonstrate that our models exceed state-of-the-art results for partial transfer learning tasks on several benchmark datasets.
- Jul 26 2017 cs.CV arXiv:1707.07819v1In this paper, we address the task of detecting semantic parts on partially occluded objects. We consider a scenario where the model is trained using non-occluded images but tested on occluded images. The motivation is that there are infinite number of occlusion patterns in real world, which cannot be fully covered in the training data. So the models should be inherently robust and adaptive to occlusions instead of fitting / learning the occlusion patterns in the training data. Our approach detects semantic parts by accumulating the confidence of local visual cues. Specifically, the method uses a simple voting method, based on log-likelihood ratio tests and spatial constraints, to combine the evidence of local cues. These cues are called visual concepts, which are derived by clustering the internal states of deep networks. We evaluate our voting scheme on the VehicleSemanticPart dataset with dense part annotations. We randomly place two, three or four irrelevant objects onto the target object to generate testing images with various occlusions. Experiments show that our algorithm outperforms several competitors in semantic part detection when occlusions are present.
- Jul 25 2017 cs.CV arXiv:1707.07584v1Foreground segmentation in video sequences is a classic topic in computer vision. Due to the lack of semantic and prior knowledge, it is difficult for existing methods to deal with sophisticated scenes well. Therefore, in this paper, we propose an end-to-end two-stage deep convolutional neural network (CNN) framework for foreground segmentation in video sequences. In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes. In the second stage, the reconstructed background and current frame are input into a multi-channel fully-convolutional sub-network (MCFCN) for accurate foreground segmentation. In the two-stage CNN, the reconstruction loss and segmentation loss are jointly optimized. The background images and foreground objects are output simultaneously in an end-to-end way. Moreover, by incorporating the prior semantic knowledge of foreground and background in the pre-training process, our method could restrain the background noise and keep the integrity of foreground objects at the same time. Experiments on CDNet 2014 show that our method outperforms the state-of-the-art by 4.9%.
- Jul 25 2017 cs.CV arXiv:1707.07256v1In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras. We propose a simple yet effective human part-aligned representation for handling the body part misalignment problem. Our approach decomposes the human body into regions (parts) which are discriminative for person matching, accordingly computes the representations over the regions, and aggregates the similarities computed between the corresponding regions of a pair of probe and gallery images as the overall matching score. Our formulation, inspired by attention models, is a deep neural network modeling the three steps together, which is learnt through minimizing the triplet loss function without requiring body part labeling information. Unlike most existing deep learning algorithms that learn a global or spatial partition-based local representation, our approach performs human body partition, and thus is more robust to pose changes and various human spatial distributions in the person bounding box. Our approach shows state-of-the-art results over standard datasets, Market-$1501$, CUHK$03$, CUHK$01$ and VIPeR.
- Jul 24 2017 cs.DS arXiv:1707.06779v1In the claw, diamond-free edge deletion problem, we are given a graph $G$ and an integer $k>0$, the question is whether there are at most $k$ edges whose deletion results in a graph without claws and diamonds as induced graphs. Based on some refined observations, we propose a kernel of $O(k^3)$ vertices and $O(k^4)$ edges, significantly improving the previous kernel of $O(k^{12})$ vertices and $O(k^{24})$ edges. In addition, we derive an $O^*(3.792^k)$-time algorithm for the claw, diamond-free edge deletion problem.
- Non-orthogonal multiple access (NOMA) enables power-domain multiplexing via successive interference cancellation (SIC) and has been viewed as a promising technology for 5G communication. The full benefit of NOMA depends on resource allocation, including power allocation and channel assignment, for all users, which, however, leads to mixed integer programs. In the literature, the optimal power allocation has only been found in some special cases, while the joint optimization of power allocation and channel assignment generally requires exhaustive search. In this paper, we investigate resource allocation in downlink NOMA systems. As the main contribution, we analytically characterize the optimal power allocation with given channel assignment over multiple channels under different performance criteria. Specifically, we consider the maximin fairness, weighted sum rate maximization, sum rate maximization with quality of service (QoS) constraints, energy efficiency maximization with weights or QoS constraints in NOMA systems. We also take explicitly into account the order constraints on the powers of the users on each channel, which are often ignored in theexisting works, and show that they have a significant impact on SIC in NOMA systems. Then, we provide the optimal power allocation for the considered criteria in closed or semi-closed form. We also propose a low-complexity efficient method to jointly optimize channel assignment and power allocation in NOMA systems by incorporating the matching algorithm with the optimal power allocation. Simulation results show that the joint resource optimization using our optimal power allocation yields better performance than the existing schemes.
- Jul 20 2017 cs.CV arXiv:1707.05974v1Identity transformations, used as skip-connections in residual networks, directly connect convolutional layers close to the input and those close to the output in deep neural networks, improving information flow and thus easing the training. In this paper, we introduce two alternative linear transforms, orthogonal transformation and idempotent transformation. According to the definition and property of orthogonal and idempotent matrices, the product of multiple orthogonal (same idempotent) matrices, used to form linear transformations, is equal to a single orthogonal (idempotent) matrix, resulting in that information flow is improved and the training is eased. One interesting point is that the success essentially stems from feature reuse and gradient reuse in forward and backward propagation for maintaining the information during flow and eliminating the gradient vanishing problem because of the express way through skip-connections. We empirically demonstrate the effectiveness of the proposed two transformations: similar performance in single-branch networks and even superior in multi-branch networks in comparison to identity transformations.
- Deep neural networks have shown effectiveness in many challenging tasks and proved their strong capability in automatically learning good feature representation from raw input. Nonetheless, designing their architectures still requires much human effort. Techniques for automatically designing neural network architectures such as reinforcement learning based approaches recently show promising results in benchmarks. However, these methods still train each network from scratch during exploring the architecture space, which results in extremely high computational cost. In this paper, we propose a novel reinforcement learning framework for automatic architecture designing, where the action is to grow the network depth or layer width based on the current network architecture with function preserved. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. The experiments on image benchmark datasets have demonstrated the efficiency and effectiveness of our proposed solution compared to existing automatic architecture designing methods.
- Sensor-based activity recognition seeks the profound high-level knowledge about human activity from multitudes of low-level sensor readings. Conventional pattern recognition approaches have made tremendous progress in the past years. However, most of those approaches heavily rely on heuristic hand-crafted feature extraction methods, which dramatically hinder their generalization performance. Additionally, those methods often produce unsatisfactory results for unsupervised and incremental learning tasks. Meanwhile, the recent advancement of deep learning makes it possible to perform automatic high-level feature extraction thus achieves promising performance in many areas. Since then, deep learning based methods have been widely adopted for the sensor-based activity recognition tasks. In this paper, we survey and highlight the recent advancement of deep learning approaches for sensor-based activity recognition. Specifically, we summarize existing literatures from three aspects: sensor modality, deep model and application. We also present a detailed discussion and propose grand challenges for future direction.
- Jul 13 2017 cs.DC arXiv:1707.03527v1Selective bulk analyses, such as statistical learning on temporal/spatial data, are fundamental to a wide range of contemporary data analysis. However, with the increasingly larger data-sets, such as weather data and marketing transactions, the data organization/access becomes more challenging in selective bulk data processing with the use of current big data processing frameworks such as Spark or keyvalue stores. In this paper, we propose a method to optimize selective bulk analysis in big data processing and referred to as Oseba. Oseba maintains a super index for the data organization in memory to support fast lookup through targeting the data involved with each selective analysis program. Oseba is able to save memory as well as computation in comparison to the default data processing frameworks.
- Jul 11 2017 cs.CV arXiv:1707.02725v2In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-$10$, CIFAR-$100$, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.
- Jul 06 2017 cs.AI arXiv:1707.01310v3In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and solving the dual MDP-policy pair yields a policy gradient solution to optimizing the parametrized environment. Furthermore, environments with discontinuous parameters are addressed by a proposed general generative framework. While the idea is illustrated by an extended two-agent rock-paper-scissors game, our experiments on a Maze game design task show the effectiveness of the proposed algorithm in generating diverse and challenging Mazes against different agents with various settings.
- Jul 04 2017 cs.SI physics.soc-ph arXiv:1707.00386v1When analyzing the statistical and topological characteristics of complex networks, an effective and convenient way is to compute the centralities for recognizing influential and significant nodes or structures. Centralities for nodes are widely researched to depict the networks from a certain perspective and perform great efficiency, yet most of them are restricted to local environment or some specific configurations and hard to be generalized to structural patterns of networks. In this paper we propose a new centrality for nodes and motifs by the von Neumann entropy, which allows us to investigate the importance of nodes or structural patterns in the view of structural complexity. By calculating and comparing similarities of this centrality with classical ones, it is shown that the von Neumann entropy node centrality is an all-round index for selecting crucial nodes, and able to evaluate and summarize the performance of other centralities. Furthermore, when the analysis is generalized to motifs to achieve the von Neumann entropy motif centrality, the all-round property is kept, the structural information is sufficiently reflected by integrating the nodes and connections, and the high-centrality motifs found by this mechanism perform greater impact on the networks than high-centrality single nodes found by classical node centralities. This new methodology reveals the influence of various structural patterns on the regularity and complexity of networks, which provides us a fresh perspective to study networks and performs great potentials to discover essential structural features in networks.
- Jul 04 2017 cs.SY arXiv:1707.00672v1The large deployment of Information and Communication Technology~(ICT) exposes the power grid to a large number of coordinated cyber-attacks. Thus, it is necessary to design new security policies that allow an efficient and reliable operation in such conflicted cyber-space. The detection of cyber-attacks is known to be a challenging problem, however, through the coordinated effort of defense-in-depth tools (e.g., Intrusion Detection Systems~(IDSs), firewalls, etc.) together with grid context information, the grid's real security situation can be estimated. In this paper, we derive a Correlation Index~(CI) using grid context information (i.e., analytical models of attack goals and grid responses). The CI reflects the spatial correlation of cyber-attacks and the physical grid, i.e., indicates the target cyber-devices associated to attack goals. This is particularly important to identify (together with intrusion data from IDSs) coordinated cyber-attacks that aim to manipulate static power applications, and ultimately cause severe consequences on the grid. In particular, the proposed CI, its properties, and defense implications are analytically derived and numerically tested for the Security Constrained Economic Dispatch~(SCED) control loop subject to measurement attacks. However, our results can be extended to other static power applications, such as Reactive Power support, Optimal Power Flow, etc.
- Jul 04 2017 cs.CV arXiv:1707.00409v2Person re-identification aims to match images of the same person across disjoint camera views, which is a challenging problem in video surveillance. The major challenge of this task lies in how to preserve the similarity of the same person against large variations caused by complex backgrounds, mutual occlusions and different illuminations, while discriminating the different individuals. In this paper, we present a novel deep ranking model with feature learning and fusion by learning a large adaptive margin between the intra-class distance and inter-class distance to solve the person re-identification problem. Specifically, we organize the training images into a batch of pairwise samples. Treating these pairwise samples as inputs, we build a novel part-based deep convolutional neural network (CNN) to learn the layered feature representations by preserving a large adaptive margin. As a result, the final learned model can effectively find out the matched target to the anchor image among a number of candidates in the gallery image set by learning discriminative and stable feature representations. Overcoming the weaknesses of conventional fixed-margin loss functions, our adaptive margin loss function is more appropriate for the dynamic feature space. On four benchmark datasets, PRID2011, Market1501, CUHK01 and 3DPeS, we extensively conduct comparative evaluations to demonstrate the advantages of the proposed method over the state-of-the-art approaches in person re-identification.
- Electronic payment protocols play a vital role in electronic commerce security, which is essential for secure operation of electronic commerce activities. Formal method is an effective way to verify the security of protocols. But current formal method lacks the description and analysis of timeliness in electronic payment protocols. In order to improve analysis ability, a novel approach to analyze security properties such as accountability, fairness and timeliness in electronic payment protocols is proposed in this paper. This approach extends an existing logical method by adding a concise time expression and analysis method. It enables to describe the event time, and extends the time characteristics of logical inference rules. We analyzed the Netbill protocol with the new approach and found that the fairness of the protocol is not satisfied, due to timeliness problem. The result illustrates the new approach is able to analyze the key properties of electronic payment protocols. Furthermore, the new approach can be introduced to analyze other time properties of cryptographic protocols.
- In millimeter wave (mmWave) systems, antenna architecture limitations make it difficult to apply conventional fully digital precoding techniques but call for low cost analog radio-frequency (RF) and digital baseband hybrid precoding methods. This paper investigates joint RF-baseband hybrid precoding for the downlink of multiuser multi-antenna mmWave systems with a limited number of RF chains. Two performance measures, maximizing the spectral efficiency and the energy efficiency of the system, are considered. We propose a codebook based RF precoding design and obtain the channel state information via a beam sweep procedure. Via the codebook based design, the original system is transformed into a virtual multiuser downlink system with the RF chain constraint. Consequently, we are able to simplify the complicated hybrid precoding optimization problems to joint codeword selection and precoder design (JWSPD) problems. Then, we propose efficient methods to address the JWSPD problems and jointly optimize the RF and baseband precoders under the two performance measures. Finally, extensive numerical results are provided to validate the effectiveness of the proposed hybrid precoders.
- We derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary, the authors and their collaborators together with classical concentration inequalities (such as Bennett's and Hoeffding-Azuma inequalities). Our bounds are expressed in terms of the Kullback-Leibler divergence R(Q||P) of model Q with respect to P and the moment generating function for the statistical estimator under P. Furthermore, concentration inequalities, i.e. bounds on moment generating functions, provide tight and computationally inexpensive model bias bounds for quantities of interest. Finally, they allow us to derive rigorous confidence bands for statistical estimators that account for model bias and are valid for an arbitrary amount of data.
- We present novel minibatch stochastic optimization methods for empirical risk minimization problems, the methods efficiently leverage variance reduced first-order and sub-sampled higher-order information to accelerate the convergence speed. For quadratic objectives, we prove improved iteration complexity over state-of-the-art under reasonable assumptions. We also provide empirical evidence of the advantages of our method compared to existing approaches in the literature.
- Jun 15 2017 cs.DB arXiv:1706.04266v3Similarity join, which can find similar objects (e.g., products, names, addresses) across different sources, is powerful in dealing with variety in big data, especially web data. Threshold-driven similarity join, which has been extensively studied in the past, assumes that a user is able to specify a similarity threshold, and then focuses on how to efficiently return the object pairs whose similarities pass the threshold. We argue that the assumption about a well set similarity threshold may not be valid for two reasons. The optimal thresholds for different similarity join tasks may vary a lot. Moreover, the end-to-end time spent on similarity join is likely to be dominated by a back-and-forth threshold-tuning process. In response, we propose preference-driven similarity join. The key idea is to provide several result-set preferences, rather than a range of thresholds, for a user to choose from. Intuitively, a result-set preference can be considered as an objective function to capture a user's preference on a similarity join result. Once a preference is chosen, we automatically compute the similarity join result optimizing the preference objective. As the proof of concept, we devise two useful preferences and propose a novel preference-driven similarity join framework coupled with effective optimization techniques. Our approaches are evaluated on four real-world web datasets from a diverse range of application scenarios. The experiments show that preference-driven similarity join can achieve high-quality results without a tedious threshold-tuning process.
- Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future nteractions. In this paper, we propose a computer Go system that follows experts way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of 253233 professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.
- We address the following problem: How do we incorporate user item interaction signals as part of the relevance model in a large-scale personalized recommendation system such that, (1) the ability to interpret the model and explain recommendations is retained, and (2) the existing infrastructure designed for the (user profile) content-based model can be leveraged? We propose Dionysius, a hierarchical graphical model based framework and system for incorporating user interactions into recommender systems, with minimal change to the underlying infrastructure. We learn a hidden fields vector for each user by considering the hierarchy of interaction signals, and replace the user profile-based vector with this learned vector, thereby not expanding the feature space at all. Thus, our framework allows the use of existing recommendation infrastructure that supports content based features. We implemented and deployed this system as part of the recommendation platform at LinkedIn for more than one year. We validated the efficacy of our approach through extensive offline experiments with different model choices, as well as online A/B testing experiments. Our deployment of this system as part of the job recommendation engine resulted in significant improvement in the quality of retrieved results, thereby generating improved user experience and positive impact for millions of users.
- Jun 14 2017 cs.CV arXiv:1706.04124v2In this work, we focus on a challenging task: synthesizing multiple imaginary videos given a single image. Major problems come from high dimensionality of pixel space and the ambiguity of potential motions. To overcome those problems, we propose a new framework that produce imaginary videos by transformation generation. The generated transformations are applied to the original image in a novel volumetric merge network to reconstruct frames in imaginary video. Through sampling different latent variables, our method can output different imaginary video samples. The framework is trained in an adversarial way with unsupervised learning. For evaluation, we propose a new assessment metric $RIQA$. In experiments, we test on 3 datasets varying from synthetic data to natural scene. Our framework achieves promising performance in image quality assessment. The visual inspection indicates that it can successfully generate diverse five-frame videos in acceptable perceptual quality.
- Jun 14 2017 cs.CV arXiv:1706.03947v1This paper considers the challenging task of long-term video interpolation. Unlike most existing methods that only generate few intermediate frames between existing adjacent ones, we attempt to speculate or imagine the procedure of an episode and further generate multiple frames between two non-consecutive frames in videos. In this paper, we present a novel deep architecture called bidirectional predictive network (BiPN) that predicts intermediate frames from two opposite directions. The bidirectional architecture allows the model to learn scene transformation with time as well as generate longer video sequences. Besides, our model can be extended to predict multiple possible procedures by sampling different noise vectors. A joint loss composed of clues in image and feature spaces and adversarial loss is designed to train our model. We demonstrate the advantages of BiPN on two benchmarks Moving 2D Shapes and UCF101 and report competitive results to recent approaches.
- We investigate the weighted-sum distortion minimization problem in transmitting two correlated Gaussian sources over Gaussian channels using two energy harvesting nodes. To this end, we develop offline and online power control policies to optimize the transmit power of the two nodes. In the offline case, we cast the problem as a convex optimization and investigate the structure of the optimal solution. We also develop a generalized water-filling based power allocation algorithm to obtain the optimal solution efficiently. For the online case, we quantify the distortion of the system using a cost function and show that the expected cost equals the expected weighted-sum distortion. Based on Banach's fixed point theorem, we further propose a geometrically converging algorithm to find the minimum cost via simple iterations. Simulation results show that our online power control outperforms the greedy power control where each node uses all the available energy in each slot and performs close to that of the proposed offline power control. Moreover, the performance of our offline power control almost coincides with the performance limit of the system.
- Jun 06 2017 cs.HC arXiv:1706.01248v1In this paper, we investigate the effectiveness of two distinct techniques (Special Moment Approach & Spatial Frequency Approach) for reviewing the lifelogs, which were collected by lifeloggers who were willing to use a wearable camera and a bracelet simultaneously for two days. Generally, Special moment approach is a technique for extracting episodic events and Spatial frequency approach is a technique for associating visual with temporal and location information, especially heat map is applied as the spatial data for expressing frequency awareness. Based on that, the participants were asked to fill in two post-study questionnaires for evaluating the effectiveness of those two techniques and their combination. The preliminary result showed the positive potential of exploring individual lifelogs using our approaches.
- May 31 2017 cs.CV arXiv:1705.10659v1Discovering automatically the semantic structure of tagged visual data (e.g. web videos and images) is important for visual data analysis and interpretation, enabling the machine intelligence for effectively processing the fast-growing amount of multi-media data. However, this is non-trivial due to the need for jointly learning underlying correlations between heterogeneous visual and tag data. The task is made more challenging by inherently sparse and incomplete tags. In this work, we develop a method for modelling the inherent visual data concept structures based on a novel Hierarchical-Multi-Label Random Forest model capable of correlating structured visual and tag information so as to more accurately interpret the visual semantics, e.g. disclosing meaningful visual groups with similar high-level concepts, and recovering missing tags for individual visual data samples. Specifically, our model exploits hierarchically structured tags of different semantic abstractness and multiple tag statistical correlations in addition to modelling visual and tag interactions. As a result, our model is able to discover more accurate semantic correlation between textual tags and visual features, and finally providing favourable visual semantics interpretation even with highly sparse and incomplete tags. We demonstrate the advantages of our proposed approach in two fundamental applications, visual data clustering and missing tag completion, on benchmarking video (i.e. TRECVID MED 2011) and image (i.e. NUS-WIDE) datasets.
- This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.